CS 105 Lab 5 Code Optimization See Calendar for Dates 1 Introduction This assignment deals with optimizing memory intensive code Image processing offers many examples of functions that can benefit from optimization In this lab we will consider two image processing operations rotate which rotates an image counter clockwise by 90 and smooth which smooths or blurs an image For this lab we will consider an image to be represented as a two dimensional matrix M where Mi j denotes the value of i j th pixel of M Pixel values are triples of red green and blue RGB values We will only consider square images Let N denote the number of rows or columns of an image Rows and columns are numbered in C style from 0 to N 1 Given this representation the rotate operation can be implemented quite simply as the combination of the following two matrix operations Transpose For each i j pair Mi j and Mj i are interchanged Exchange rows Row i is exchanged with row N 1 i This combination is illustrated in Figure 1 The smooth operation is implemented by replacing every pixel value with the average of all the pixels around it in a maximum of 3 3 window centered at that pixel Consider Figure 2 The values of pixels M2 1 1 and M2 N 1 N 1 are given below M2 1 1 P2 M2 N 1 N 1 i 0 P2 j 0 M1 i j 9 PN 1 i N 2 PN 1 j N 2 M1 i j 4 1 i j 0 0 Rotate by 90 i counter clockwise j 0 0 i 0 0 Exchange Rows Transpose j Figure 1 Rotation of an image by 90 counterclockwise M1 1 1 M2 1 1 smooth M1 N 1 N 1 M2 N 1 N 1 Figure 2 Smoothing an image 2 2 Logistics You are to work in a group of two people in solving the problems for this assignment The only hand in will be electronic Any clarifications and revisions to the assignment will be posted on the lab Web page or course email 3 Handout Instructions The materials for this lab are on the course web page Start by copying perflab handout tar to a protected directory in which you plan to do your work Then give the command tar xvf perflab handout tar This will cause a number of files to be unpacked into the directory The only file you will be modifying and handing in is kernels c The driver c program is a driver program that allows you to evaluate the performance of your solutions Use the command make driver to generate the driver code and run it with the command driver Note that you are not allowed to change the Makefile which also means you are not allowed to fiddle with compiler switches Looking at the file kernels c you ll notice a C structure team into which you should insert the requested identifying information about the two individuals comprising your programming team Do this right away so you don t forget 4 Implementation Overview Data Structures The core data structure deals with image representation A pixel is a struct as shown below typedef struct unsigned short red R value unsigned short green G value unsigned short blue B value pixel As can be seen RGB values have 16 bit representations 16 bit color An image I is represented as a onedimensional array of pixels where the i j th pixel is I RIDX i j n Here n is the dimension of the image matrix and RIDX is a macro defined as follows define RIDX i j n i n j See the file defs h for this code Rotate The following C function computes the result of rotating the source image src by 90 and stores the result in destination image dst dim is the dimension of the image 3 void naive rotate int dim pixel src pixel dst int i j for i 0 i dim i for j 0 j dim j dst RIDX dim 1 j i dim src RIDX i j dim return The above code scans the rows of the source image matrix copying to the columns of the destination image matrix Your task is to rewrite this code to make it run as fast as possible using techniques like code motion loop unrolling and blocking See the file kernels c for this code Smooth The smoothing function takes as input a source image src and returns the smoothed result in the destination image dst Here is part of an implementation void naive smooth int dim pixel src pixel dst int i j for i 0 i dim i for j 0 j dim j dst RIDX i j dim avg dim i j src Smooth the i j th pixel return The function avg returns the average of all the pixels around the i j th pixel Your task is to optimize smooth and avg to run as fast as possible Note The function avg is a local function and you can get rid of it altogether to implement smooth in some other way This code and an implementation of avg is in the file kernels c Performance measures Our main performance measure is CPE or Cycles per Element If a function takes C cycles to run for an image of size N N the CPE value is C N 2 Table 1 summarizes the performance of the naive implementations shown above and compares it against an optimized implementation Performance is shown for 5 different values of N All measurements were made on Wilkes which is a Pentium III Xeon machine The ratios speedups of the optimized implementation over the naive one will constitute a score of your implementation To summarize the overall effect over different values of N we will compute the geometric mean of the results for these 5 values That is if the measured speedups for N 32 64 128 256 512 are R32 R64 R128 R256 and 4 Test case N Method Naive rotate CPE Optimized rotate CPE Speedup naive opt Method Naive smooth CPE Optimized smooth CPE Speedup naive opt N 1 64 22 1 8 0 2 8 32 524 41 5 12 6 2 128 21 6 8 6 2 5 64 525 41 6 12 6 3 256 27 6 14 8 1 9 128 527 41 2 12 8 4 512 79 8 22 1 3 6 256 522 53 5 9 8 5 1024 220 9 25 3 8 7 512 523 56 4 9 3 Geom Mean 3 1 Geom Mean 11 3 Table 1 Sample CPEs and Ratios for Optimized vs Naive Implementations R512 then we compute the overall performance as p R 5 R32 R64 R128 R256 R512 Assumptions To make life easier you can assume that N is a multiple of 32 Your code must run correctly for all such values of N but we will measure its performance only for the 5 values shown in Table 1 note that the CPEs and speedups in this table …
View Full Document
Unlocking...