Snapshot Sanjay Jhaveri Mike Huhs 6.111 Final Project The goal of this final project is to implement a digital camera using a Xilinx Virtex II FPGA that is built into the 6.111 Labkit. The FPGA will interface with a video camera, and output the digital signal to a VGA monitor. The user will then be able to capture the displayed image by pressing a button on the labkit. Once the image has been captured, it will be compressed using a discrete cosine transform (DCT) in order to be stored and viewed at a later time. In addition to image compression, the user will be able to perform zoom and rotate operations as well as perform some simple image filtering. Implementing Snapshot will involve using the onboard ZBT memory of the 6.111 Labkit to store the video signal. This project should be a natural continuation of 6.111 Labs 3 and 4, as it builds upon the concepts of VGA display and memory. The lab will be broken up into: extracting the received video signal, displaying the signal to the monitor, and storing the video signal in the onboard Labkit memory.1 Table of Contents 1 Overview………………………………………………………………………………...2 2 High Level Block Diagram……………………………………………………………...5 3 Module Descriptions…………………………………………………………………….5 4 Testing & Debugging…………………………………………………………………..12 5 Conclusion……………………………………………………………………………..14 List of Figures Figure 1: User Interface…………………………………………………………………...2 Figure 2: DCT Matrix Coefficients………………………………………………………..3 Figure 3: Matlab Results…………………………………………………………………..4 Figure 4: Digital Camera Block Diagram…………………………………………………5 Figure 5: NTSC Vertical Timing Reference………………………………………………7 Figure 6: Timing Pipeline…………………………………………………………………8 Figure 7: Image Compression & Storage Block Diagram……………………………….10 Figure 8: Test Benches…………………………………………………………………...132 Overview 1.1 Introduction Digital Cameras use aggressive compression techniques to be able to store large image files on relatively little memory. These compression techniques significantly reduce the amount of memory required to store and image while sacrificing relatively little image quality. The purpose of this project was to implement a digital camera capable of storing multiple images on a 4MB SRAM a through discrete cosine transform (DCT) image compression. This was to be done using a NTSC video camera, a Xilinx XC2v6000 Labkit, and an XVGA monitor displaying the user interface for the camera. 1.2 User Interface The user interface for the camera is shown in Figure 1. Basically, the user sees a live video feed from the video camera, and uses a mouse and three buttons to operate the camera. The “Shoot!” button captures and image and stores it to memory, the “View” button allows user view the pictures in memory by replacing the video feed with a stored picture, and the “Live” button takes the user back to a live video feed to shoot another picture. Shoot!ViewLiveShoot!ViewShoot ButtonView ButtonLive Mode ButtonMouse CursorLive Video Figure 1: User Interface3 1.3 Image Compression Background Due to the limited memory resources available on the 6.111 lab kit, it was only possible to store a single uncompressed video frame to each of the two available zbt srams, thus in order to realize the full functionality of a digital camera it was necessary to compress the image prior to storage in memory. We attempted to implement a highly simplified jpeg compression scheme which relies upon the discrete cosine transform to decompose an image into its various frequency components. A 2D DCT can be applied to an image by separating the image into 8 x 8 pixel blocks and applying the transform to each of these blocks separately. The 2D DCT can be viewed as a series of matrix multiplications where the pixel matrix (M) is first multiplied by the DCT matrix (T) and then by its transpose (T’) according to the equation D = TMT’, where the DCT coefficients are calculated according to the figure below. Figure 2: DCT Matrix Coefficients The values of the resulting transformed matrix (D) represent the different frequency components of the input image. The DC value is located in the first row and column and while the higher frequency components are located in the lower right indices. With normal jpeg encoding this matrix is divided by a quantization matrix and rounded in order to force as many terms to zero as possible and the resulting matrix is then run length encoded along the diagonals. However, because the eye is less sensitive to high frequencies and because the high frequency coefficients are often much less than the lower frequency coefficients it is possible to achieve image compression by storing only the low frequency components to memory. The image can then be decompressed by replacing the discarded high frequency terms in the transformed matrix with zeros and applying the IDCT using the equation M = T’DT, which is simply the DCT in reverse. In order to assess the effectiveness of this method we used matlab to compute the error resulting from applying this process to a sample 8 x 8 pixel matrix. When we applied the 2D-DCT to the sample matrix and stored only the first 5 rows and columns of the transformed matrix, the resulting decompressed matrix had an average pixel error of 7.25, which would be almost undetectable to the human eye considering a range of 256 possible values. However, it should be noted that the error can very greatly depending upon the frequency spectrum of a particular image block and images with lots of sharp features will be affected much more dramatically by the compression process than images with smoother color gradients. Additionally, the calculations performed by matlab can be expected
View Full Document