UCF CDA 5106 - Homework Solutions - D3456422

Home> Schools> University of Central Florida> Computer Design/Architecture (CDA) > CDA 5106> Homework Solutions

DOC PREVIEW

UCF CDA 5106 - Homework Solutions

School name University of Central Florida

Course Cda 5106- Advanced Computer Architecture

Pages 7

This preview shows page 1-2 out of 7 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 7 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 7 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 7 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

John Paul Shen / Mikko H. Lipasti 1Homework SolutionsChapter 11. Using the resources of the World Wide Web, list the top five reported benchmark results for SPECINT2000, SPECFP2000, and TPC-C.Current results are available from http://www.spec.org and http://www.tpc.org2. Graph SPECINT2000 vs. processor frequency for two different processor families (e.g., AMD Athlon and HP PA-RISC) for as many frequencies as are posted at www.spec.org. Comment on performance scaling with frequency, pointing out any anomalies and suggesting possible expla-nations for them.Current results are available from http://www.spec.org; performance scaling will vary depending on processor family. Look for reasoning based on the following factors that do not scale with processor frequency: memory bandwidth, memory latency, new microarchitectural features, compiler improvements.3. Explain the differences between architecture, implementation, and realization. Explain how each of these relates to processor performance as expressed in Equation 1-1.Architecture specifies the interface between the hardware and the programmer and affects the number of instructions that need to be executed for a particular program or algorithm. Imple-mentation specifies the microarchitectural organization of the design, and determines the instruc-tion execution rate (measured in instructions per cycle) for that particular design. The realization actually maps the implementation to silicon, and determines the cycle time. The product of the three terms determines performance.Modern Processor Design: Fundamentals of Superscalar Processors 2 4. As silicon technology evolves, implementation constraints and tradeoffs change, which can affect the placement and definition of the dynamic-static interface (DSI). Explain why architect-ing a branch delay slot (as in the MIPS architecture) was a reasonable thing to do when that architecture was introduced, but is less attractive today.When the MIPS ISA was created, a single-chip processor could reasonably contain a straightfor-ward pipelined implementation like the MIPS R2000. There was no room on chip for branch predictors or decoupled fetch pipelines that could be used to mask branch latency. At the same time, processors were already fast enough to tolerate some additional complexity in the compila-tion process for finding independent instructions to place in the delay slot. Hence, a branch delay slot made sense, since it reduced the penalty due to branch instructions without negatively affect-ing implementation or realization. However, as silicon technology enabled more advanced pipe-lines, including superscalar fetch, other techniques for masking the branch latency became available and more attractive. Once this had occurred, the branch delay slot became an archaic special case that only complicated implementation and realization. Hence, more recent ISAs like PowerPC and Alpha omitted the branch delay slot.5. Many times, implementation issues for a particular generation end up determining tradeoffs in instruction set architecture. Discuss at least one historical implementation constraint that explains why CISC instruction sets were a sensible choice in the 1970s.Possible answers to this discussion question include the high cost of memory for program stor-age, which favored dense CISC encodings. In addition, limited on-chip implementation resources forced various dataflow components to be reused for various purposes. As a result, microcoded control over multicycle execution of each instruction was necessary. Once you assume multicycle microcoded execution, the additional decoding overhead due to CISC encod-ing becomes less important.6. A program’s run time is determined by the product of instructions per program, cycles per instruction, and clock frequency. Assume the following instruction mix for a MIPS-like RISC instruction set: 15% stores, 25% loads, 15% branches, and 30% integer arithmetic, 5% integer shift, and 5% integer multiply. Given that stores require one cycle, load instructions require two cycles, branches require four cycles, integer ALU instructions require one cycle, and integer mul-tiplies require ten cycles, compute the overall CPI.TABLE 1 CPI computationType Mix Cost CPIstore 15% 1 0.15load 25% 2 0.50branch 15% 4 0.60integer 30% 1 0.30shift 5% 1 0.05multiply 5% 10 0.50Total 2.10Modern Processor Design: Fundamentals of Superscalar Processors3 7. Given the parameters of Problem 6, consider a strength-reducing optimization that converts mul-tiplies by a compile-time constant into a sequence of shifts and adds. For this instruction mix, 50% of the multiplies can be converted to shift-add sequences with an average length of three instructions. Assuming a fixed frequency, compute the change in instructions per program, cycles per instruction, and overall program speedup.There are 5% more instructions per program, the CPI is reduced by 12.7% to 1.83, and overall speedup is 2.1/1.925 = 1.091 or 9.1%.8. Recent processors like the Pentium 4 processors do not implement single-cycle shifts. Given the scenario of Problem 7, assume that s = 50% of the additional integer and shift instructions intro-duced by strength reduction are shifts, and shifts now take four cycles to execute. Recompute the cycles per instruction and overall program speedup. Is strength reduction still a good optimiza-tion?Speedup is now a slowdown: 2.1/2.1875 = 0.96 or 4% slowdown, hence strength reduction is a bad idea.9. Given the assumptions of Problem 8, solve for the break-even ratio s (percentage of additional instructions that are shifts). That is, find the value of s (if any) for which program performance is identical to the baseline case without strength reduction (Problem 6).2.1 = (0.15+0.50+0.60+0.30+0.05x4+0.25 + (1-s)x0.075x1 + sx0.075x40.025 = 0.225s => s = 0.111 = 11.1%TABLE 2 CPI computationType Old Mix New Mix Cost CPIstore 15% 15% 1 0.15load 25% 25% 2 0.50branch 15% 15% 4 0.60integer & shift 35% 42.5% 1 0.425multiply 5% 2.5% 10 0.25Total 100% 105% 1.925/105% = 1.83TABLE 3 CPI computationType Old Mix New Mix Cost CPIstore 15% 15% 1 0.15load 25% 25% 2 0.50branch 15% 15% 4 0.60integer 30% 33.75% 1 0.3375shift 5% 8.75% 4 0.35multiply 5% 2.5% 10 0.25Total 100% 105% 2.1875/105% = 2.083Modern Processor Design: Fundamentals of Superscalar Processors 4 10. Given the assumptions of Problem 8, assume you are designing the shift unit on the Pentium 4 processor. You have concluded there

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2 out of 7 pages.

UCF CDA 5106 - Homework Solutions

Sign up for free to view:

Please select your school