Berkeley COMPSCI 252 - CS 252 Problem - D74295

Home> Schools> University of California, Berkeley> Computer Science (COMPSCI) > COMPSCI 252> CS 252 Problem

DOC PREVIEW

Berkeley COMPSCI 252 - CS 252 Problem

School name University of California, Berkeley

Course Compsci 252- Graduate Computer Architecture

Pages 5

This preview shows page 1-2 out of 5 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 5 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 5 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 5 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Problem 1:You are building a system around a processor with inorder execution that runs at 1.1 GHz and has a CPI of 0.7 excluding memory accesses. The only instructions that read or write data from memory are loads (20% of all instructions) and stores (5% of all instructions).The memory system for this computer is composed of a split L1 cache that imposes no penalty on hits. Both the I-cache and D-cache are direct mapped and hold 32 KB each. The I-cache has a 2% miss rate and 32-byte blocks, and the Dcache is write through with a 5% miss rate and 16-byte blocks. There is a write buffer on the D-cache that eliminates stalls for 95% of all writes. The 512 KB write-back, unified L2 cache has 64-byte blocks and an access time of 15 ns. It is connected to the L1 cache by a 128-bit data bus that runs at 266 MHz and can transfer one 128-bit word per bus cycle. Of all memory references sent to the L2 cache in this system, 80% are satisfied without going to main memory. Also, 50% of all blocks replaced are dirty. The 128-bit-wide main memory has an access latency of 60 ns, after which any number of bus words may be transferred at the rate of one percycle on the 128-bit-wide 133 MHz main memory bus.a) What is the average memory access time for instruction accesses?b) What is the average memory access time for data reads?c) What is the average memory access time for data writes?d) What is the overall CPI, including memory accesses?e) You are considering replacing the 1.1 GHz CPU with one that runs at 2.1 GHz, but isotherwise identical. How much faster does the system run with a faster processor? Assume the L1 cache still has no hit penalty, and that the speed of the L2 cache, main memory, and buses remains the same in absolute terms (e.g., the L2 cache still has a 15 ns access time and a 266 MHz bus connecting it to the CPU and L1 cache).f) If you want to make your system run faster, which part of the memory system would you improve? Graph the change in overall system performance holding all parameters fixed except the one that you’re improving. Parameters you might consider improving include L2 cache speed, bus speeds, main memory speed, and L1and L2 hit rates. Based on these graphs, how could you best improve overall system performance with minimal cost?Problem 2:As caches increase in size, blocks often increase in size as well.a) If a large instruction cache has larger data blocks, is there still a need for prefetching? Explain the interaction between prefetching and increased block size in instruction caches.b) Is there a need for data prefetch instructions when data blocks get larger? Explain.Problem 3:Some memory systems handle TLB misses in software (as an exception), while othersuse hardware for TLB misses.a) What are the trade-offs between these two methods for handling TLB misses?b) Will TLB miss handling in software always be slower than TLB miss handling in hardware? Explain.c) Are there page table structures that would be difficult to handle in hardware, but possible in software? Are there any such structures that would be difficult for softwareto handle but easy for hardware to manage?d) Use the data from Figure 5.45 to calculate the penalty to CPI for TLB misses on the following workloads assuming hardware TLB handlers require 10 cycles per miss and software TLB handlers take 30 cycles per miss: (50% gcc, 25% perl, 25% ijpeg), (30%swim, 30% wave5, 20% hydro2d, 10% gcc).e) Are the TLB miss times in part (d) realistic? Discuss.f) Why are TLB miss rates for floating-point programs generally higher than those for integer programs?Problem 4:Disk sectors are addressed sequentially within a track, tracks sequentially within cylinders, and cylinders sequentially within the disk. Determining head switch time and cylinder switch time is difficult because of rotational effects. Even determining platter count, sectors/track, rotational delay, and minimum time to media is difficult based on observation of typical disk workloads. The key is to factor out disk rotationaleffects by making consecutive seeks to individual sectors with addresses that differ by a linearly increasing amount starting with 0, 1, 2, and so forth. The Skippy algorithm, from work by Nisha Talagala and colleagues of U.C. Berkeley[2000], isfd = open(“raw disk device”);for (i = 0; i < measurements; i++) {//time the following sequence, and output <i, time>lseek(fd, i * SINGLE_SECTOR, SEEK_CUR);write(fd, buffer, SINGLE_SECTOR);}close(fd);The basic algorithm skips through the disk, increasing the distance of the seek byone sector before every write, and outputs the distance and time for each write.The raw device interface is used to avoid file system optimizations. SINGLE_SECTOR is the size of a single sector in bytes. The SEEK_CUR argument tolseek moves the file pointer an amount relative to the current pointer. A technicalreport describing Skippy and two other disk drive microbenchmarks (run in secondsor minutes rather than hours or days) is at http://sunsite.berkeley.edu/Dienst/UI/2.0/Describe/ncstrl.ucb/CSD-99-1063.Figure 1 shows the output from running the benchmark Skippy on a disk.a) What is the number of heads? The number of platters?b) What is the rotational latency?c) What is the head switch time?d) What is the cylinder switch time?e) What is the minimum time to media plus transfer time? The minimum time to media is the minimum time to access the disk platter surface. A disk request completes in the sum of the minimum time to media plus the transfer time if there is no rotational or seek latency.Figure 1: Example output of Skippy for a hypothetical diskProblem 5:Suppose a processor sends 10 disk I/Os per second, these requests are exponentially distributed, and the average service time of an older disk is 20 ms. Answer the following questions:a) On average, how utilized is the disk?b) What is the average time spent in the queue?c) What is the average response time for a disk request, including the queuing time and disk service time?d) Calculate the average length of the queue.e) Calculate the average length of the system.f) Suppose we get a new, faster disk. Recalculate the answers to the questions a-e above, assuming the disk service time is 10 ms.g) Suppose instead of a new, faster disk, we add a second slow disk, and duplicate the data so that reads can be serviced by either disk. Let’s assume that the requests are all reads. Recalculate the answers to questions a-e, this time using an M/M/m

View Full Document

Berkeley COMPSCI 252 - CS 252 Problem

Sign up for free to view:

This document and 3 million+ documents and flashcards
High quality study guides, lecture notes, practice exams
Course Packets handpicked by editors offering a comprehensive review of your courses
Better Grades Guaranteed


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2 out of 5 pages.

Berkeley COMPSCI 252 - CS 252 Problem

Sign up for free to view:

Please select your school