DOC PREVIEW
U of I CS 232 - Goal

This preview shows page 1 out of 2 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 2 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 2 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CS232 Fall 2006 VTune Lab November 13, 2006GoalThe goal of this section is to get you familiar with the VTune analyzer for Performance Optimization.Recall from lecture that the most efficient methodology for optimizing programs is as follows:1. Write a working version of the program.2. Create suitable benchmarks and collect data for them.3. Identify “hotspots” and fine-tune those areas of the code.4. If the performance objectives have not been met, go back to Step 3.In this section, you will implement this methodology. A working version of the code (p1.c) and a suitablebenchmark (datafile.txt) will be provided. Your task is to use VTune to identify hotspots and determinehow to optimize the code. Do not attempt to read and understand the whole code. Instead, focus onlyon the slow segments (or functions) identified by VTune.Getting started1. Please seat yourself at one of the machines csil-core1 through csil-core25. If you can’t find afree machine, work in groups of two.2. Open a terminal and type the following commands:mkdir vtunecd vtunecp ~kumar/vtune/* .//opt/intel/vtune/bin/vtlec &The last command will start the VTune analyzer in the background. Make sure your current directorycontains the following files: p1.c, p2.c, p3.c, datafile.txt and Makefile.3. If something isn’t working, let us know. Otherwise, you can start working on Problem 1.Problem 1a: Identifying the “Hotspot”Compile the file p1.c with the command make p1. This will create an executable file which you can runwith the command p1. The program takes about 4 seconds to run.Using VTune:1. Click “Tuning”→“New Activity” and seclect the First Use Wizard. Click “Next”.2. For setting the Application to launch, click “Browse” and select the vtune/p1 program.3. Set the Working di rectory by clicking “Browse” and selecting Home followed by vtune.4. Click “Finish” to run the First Use Activity. Do not click anything until the “Most ActiveFunctions in Your Application” tab is displayed.As you can see, the SortArray function takes up almost all the time. Click on the “SortArray” link toview the source code. Scroll down until you have identified the slowest code-segment within this function(refer to the “Clockticks” column to see where the code is spending most of its time).1CS232 Fall 2006 VTune Lab November 13, 2006Problem 1b: The Call GraphA quick check should convince you that p1.c spends mos t of its time in the inner loop of the bubble-sort.As you know, there are faster sorting algorithms (e.g. QuickSort) – should we implement one of these?Before jumping to this conclusion, it is worth realizing that there could be two reasons why SortArray takesso long: (a) the function itself is slow (in which case switching to QuickSort could help), or (b) SortArrayis being called a large number of times (in which case, the caller must be examined more carefully).To determine which is the case, analyze p1 with the Call Graph Wizard. This produces a graphical viewof where the code is spending most of its time. In particular, it identifies the sequence of function calls,known as the critical path (shown in red), along which the bulk of the time is spent.First examine the critical path to determine which calls to SortArray are expensive. Now examine thecode of the caller. You will see that it is using a very poor algorithm to achieve its task. Suggest a simple,faster algorithm (no need to implement it).Problem 2: Better Data StructureThe code in p2.c implements a simple algorithmic fix for the hotspot in Problem 1.1Our goal in thisproblem is to compare two data structures used to represent sorted data. Compile p2.c with the commandmake p2 and run the First Use Wizard again (this time, select p2 as the Application to launch). Explainyour results. (Hint: Caches, locality.)Problem 3: Better InstructionsFinally, we will rewrite the code by identifying slow instructions and replacing them with faster ones.Compile the code in p3.c2, and this time run the Sampling Wizard. When the activity is complete, clickon “Clockticks” to examine the slowest function by clockticks. Click on the name of the slowest function toexamine its source code. To view the corrsponding x86 assembly instructions, click the “Mixed by Source”button above the “Line Number” column.Because modern processors reorder assembly instructions in order to achieve high performance, theclock-ticks per instruction may not be accurate. However, VTune is able to correctly identify the slowestblock of code. Rewrite this block to achieve improved performance. Explain w hy you made the changesyou did:1This speeds the code so dramatically, in fact, that we must run it several time in order to get representative results! Forp2.c, the constant ITERS has been increased from 2 to 64.2For p3.c, we have removed the expensive calls to AddToArray and AddToLinkedList. Take CS 225/CS 473 to learn efficientways of maintaining sorted


View Full Document

U of I CS 232 - Goal

Documents in this Course
Exam 1

Exam 1

5 pages

Exam 1

Exam 1

6 pages

Exam 2

Exam 2

6 pages

Exam 1

Exam 1

5 pages

Load more
Download Goal
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Goal and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Goal 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?