Unformatted text preview:

Review 1 2 Optimal Pipeline Each stage is executing part of an instruction each clock cycle CS61C Machine Structures One instruction finishes during each clock cycle Lecture 22 Introduction to Performance On average execute far more quickly November 17 2000 What makes this work David Patterson Similarities between instructions allow us to use same stages for all instructions generally http www inst eecs berkeley edu cs61c Each stage takes about the same amount of time as all others little wasted time 1 CS61C L221 Performance UC Regents 2 CS61C L221 Performance UC Regents Review 2 2 Outline Pipelining a Big Idea widely used concept Performance Calculation What makes it less than perfect Virtual Memory Review Structural hazards only one cache Need Benchmarks suppose we had more HW resources Control hazards need to worry about branch instructions D elayed branch Data hazards an instruction depends on a previous instruction CS61C L221 Performance UC Regents 3 Performance Purchasing Perspective given a collection of machines which has the Two Notions of Performance Plane best performance Boeing 747 least cost best performance cost BAD Sud Concorde Computer Designer Perspective faced with design options which has the best performance improvement least cost best performance cost Both require basis for comparison and metric for evaluation CS61C L221 Performance UC Regents 4 CS61C L221 Performance UC Regents 5 DC to Top Passen Throughput Paris Speed gers pmph 6 5 610 470 286 700 hours mph 3 hours 1350 mph 132 Which has higher performance Time to deliver 1 passenger Time to deliver 400 passengers In a computer time for 1 job called Response Time or Execution Time In a computer jobs per day called Throughput or Bandwidth CS61C L221 Performance UC Regents 178 200 6 Definitions Example of Response Time v Throughput Performance is in units of things per sec Time of Concorde vs Boeing 747 Concord is 6 5 hours 3 hours 2 2 times faster bigger is better If we are primarily concerned with response time Throughput of Boeing vs Concorde Boeing 747 286 700 pmph 178 200 pmph 1 6 times faster performance x 1 execution time x X is n times faster than Y means Performance X n Performance Y CS61C L221 Performance UC Regents 7 Confusing Wording on Performance Boeing is 1 6 times 60 faster in terms of throughput Concord is 2 2 times 120 faster in terms of flying time response time We will focus primarily on execution time for a single job CS61C L221 Performance UC Regents 8 What is Time Straightforward definition of time W ill try to stick to n times faster its less confusing than m faster Total time to complete a task including disk accesses memory accesses I O activities operating system overhead A s f a s t e r m e a n s b o t h increased p e r f o r m a n c e a n d d e c r e a s e d execution time to reduce confusion will use i m p r o v e p e r f o r m a n c e or improve execution time real time response time or elapsed time Alternative just time processor CPU is working only on your program since multiple processes running at same time C P U e x e c u t i o n t i m e or C P U t i m e Often divided into system CPU time in OS a n d u s e r C P U t i m e in user program CS61C L221 Performance UC Regents 9 How to Measure Time CS61C L221 Performance UC Regents 10 Measuring Time using Clock Cycles 1 2 User Time seconds CPU execution time for program CPU Time Computers constructed using a clock that runs at a constant rate and determines when events take place in the hardware Clock Cycles for a program x Clock Cycle Time or These discrete time intervals called clock cycles or informally clocks or cycles Clock Cycles for a program Clock Rate Length of clock period clock cycle time e g 2 nanoseconds or 2 ns and clock rate e g 500 megahertz or 500 MHz which is the inverse of the clock period use these CS61C L221 Performance UC Regents 11 CS61C L221 Performance UC Regents 12 Measuring Time using Clock Cycles 2 2 Performance Calculation 1 2 CPU execution time for program Clock Cycles for program x Clock Cycle Time One way to define clock cycles Clock Cycles for program Instructions for a program called Instruction Count Substituting for clock cycles CPU execution time for program Instruction Count x CPI x Clock Cycle Time x A v e r a g e C lock cycles P er Instruction abbreviated C P I CPI one way to compare two machines with s a m e instruction set since Instruction Count would be the same Instruction Count x C P I x Clock Cycle Time 13 CS61C L221 Performance UC Regents Performance Calculation 2 2 14 CS61C L221 Performance UC Regents Administrivia Rest of 61C R e s t o f 6 1 C s l o w e r p a c e CPU time Instructions x Cycles Program CPU time Instructions x Cycles Program 1 project 1 lab no more h o m e w o r k s Cycle F 11 17 P e r f o r m a n c e C a c h e S i m Project W 11 24 X 8 6 P C b u z z w o r d s a n d 6 1 C R A I D L a b x Seconds Instruction CPU time Instructions x Cycles Program CPU time Seconds x Seconds Instruction Cycle W 11 29 Review Pipelines Feedback lab F 12 1 Review Caches TLB VM Section 7 5 x Seconds Instruction M 12 4 Deadline to correct your grade record Cycle W 12 6 Review Interrupts A 7 Feedback lab F 12 8 6 1 C S u m m a r y Y o u r C a l h e r i t a g e HKN Course Evaluation Program Product of all 3 terms if missing a term can t predict time the real measure of performance CS61C L221 Performance UC Regents Sun Tues 15 How Calculate the 3 Components 12 10 12 12 Final Review 2PM 155 Dwinelle Final 5PM 1 P imintel CS61C L221 Performance UC Regents 16 Calculating CPI Another Way Clock Cycle Time in specification of computer Clock Rate in advertisements First calculate CPI for each individual instruction add sub and etc Instruction Count Next calculate frequency of each individual instruction Count instructions in loop of small program Use simulator to count instructions Hardware counter in spec register Pentium II Finally multiply these two for each instruction and add them up to get final CPI CPI Calculate Execution Time Clock cycle time Instruction Count Hardware counter in special register PII CS61C L221 Performance UC Regents 17 CS61C L221 Performance UC Regents 18 Example RISC processor Can Calculate Memory portion of CPI …


View Full Document

Berkeley COMPSCI 61C - Lecture 22 - Introduction to Performance

Documents in this Course
SIMD II

SIMD II

8 pages

Midterm

Midterm

7 pages

Lecture 7

Lecture 7

31 pages

Caches

Caches

7 pages

Lecture 9

Lecture 9

24 pages

Lecture 1

Lecture 1

28 pages

Lecture 2

Lecture 2

25 pages

VM II

VM II

4 pages

Midterm

Midterm

10 pages

Load more
Loading Unlocking...
Login

Join to view Lecture 22 - Introduction to Performance and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 22 - Introduction to Performance and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?