DOC PREVIEW
Berkeley COMPSCI 252 - Lec05-speculation

This preview shows page 1-2-3-4-27-28-29-30-56-57-58-59 out of 59 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Lecture 5: ILP Continued: Intro to VLIW and SuperscalarReview: Three Parts of the ScoreboardReview: Scoreboard SummaryBeyond CPI = 1Slide 5Architectures for Embedded Systems vs. GPCEmbedded Systems: Products - 1Embedded Systems: Products - 2Embedded System implementationIntegration boosts performance/cuts costMemory Dominance in StrongArmEmbedded Systems vs. General Purpose Computing - 1Embedded Systems vs. General Purpose Computing - 2Trickle Down Theory of Embedded ArchitecturesGetting CPI < 1: Issuing Multiple Instructions/CycleAnother Dynamic Algorithm: Tomasulo AlgorithmTomasulo Algorithm vs. ScoreboardTomasulo OrganizationReservation Station ComponentsThree Stages of Tomasulo AlgorithmTomasulo Example Cycle 0Review: TomasuloHW support for More ILPDynamic Branch Prediction SummarySlide 25Slide 26Four Steps of Speculative Tomasulo AlgorithmRenaming RegistersDynamic Scheduling in PowerPC 604 and Pentium ProSlide 30Dynamic Scheduling in Pentium ProSlide 32Review: Unrolled Loop that Minimizes Stalls for ScalarLoop Unrolling in SuperscalarMultiple Issue ChallengesLoop Unrolling in VLIWTrace SchedulingAdvantages of HW (Tomasulo) vs. SW (VLIW) SpeculationSuperscalar v. VLIWIntel/HP “Explicitly Parallel Instruction Computer (EPIC)”Dynamic Scheduling in SuperscalarSlide 42Performance of Dynamic SSSoftware PipeliningSoftware Pipelining ExampleSW Pipelined AssemblerLimits to Multi-Issue MachinesSlide 48Limits to ILPSlide 50Upper Limit to ILP: Ideal Machine (Figure 4.38, page 319)More Realistic HW: Branch Impact Figure 4.40, Page 323More Realistic HW: Register Impact Figure 4.44, Page 328More Realistic HW: Alias Impact Figure 4.46, Page 330Realistic HW for ‘9X: Window Impact (Figure 4.48, Page 332)PowerPoint Presentation3 1996 Era Machines3 1997 Era MachinesSummaryKK CS252 1Lecture 5: ILP Continued:Intro to VLIW and SuperscalarPrepared by: Professor David A. PattersonComputer Science 252, Fall 1998Edited, expanded, and presented by : Prof. Kurt KeutzerComputer Science 252, Spring 2000KK CS252 2Review: Three Parts of the Scoreboard1. Instruction status—which of 4 steps the instruction is in2. Functional unit status—Indicates the state of the functional unit (FU). 9 fields for each functional unitBusy—Indicates whether the unit is busy or notOp—Operation to perform in the unit (e.g., + or –)Fi—Destination registerFj, Fk—Source-register numbersQj, Qk—Functional units producing source registers Fj, FkRj, Rk—Flags indicating when Fj, Fk are ready3. Register result status—Indicates which functional unit will write each register, if one exists. Blank when no pending instructions will write that registerKK CS252 3Review: Scoreboard Summary•Speedup 1.7 from compiler; 2.5 by hand BUT slow memory (no cache)•Limitations of 6600 scoreboard–No forwarding (First write regsiter then read it)–Limited to instructions in basic block (small window)–Number of functional units(structural hazards)–Wait for WAR hazards–Prevent WAW hazardsKK CS252 4Beyond CPI = 1•Initial goal to achieve CPI = 1•Can we improve beyond this?•Two approaches•Superscalar: –varying no. instructions/cycle (1 to 8), –scheduled by compiler or by HW (Tomasulo)–e.g. IBM PowerPC, Sun UltraSparc, DEC Alpha, HP 8000–The successful approach (to date) for general purpose computing•Anticipated success lead to use of Instructions Per Clock cycle (IPC) vs. CPIKK CS252 5Beyond CPI = 1•Alternative approach•(Very) Long Instruction Words (V)LIW: –fixed number of instructions (4-16) –scheduled by the compiler; put ops into wide templates–Currently found more success in DSP, Multimedia applications–Joint HP/Intel agreement in 1999/2000–Intel Architecture-64 (Merced/A-64) 64-bit address –Style: “Explicitly Parallel Instruction Computer (EPIC)”•But first a little context ….KK CS252 6Architectures for Embedded Systems vs. GPC•Traditionally embedded processors have (economically) dominated general purpose processors–quite significantly in numbers shipped (8 bit vs. 32 bit)–also in revenue•Still, for some time high-end microprocessors were the technological drivers of the semiconductor industry–First due to high-end workstations–Then due to personal computers•Increasingly embedded systems and not computer products are driving both the economics and the technology of the semiconductor industry• This increasingly motivates a study of processors, and their architectures, for embedded systemsKK CS252 7Embedded Systems: Products - 1Computer Relatedpersonal digital assistantprinterdisc drivemultimedia subsystemgraphics subsystemgraphics terminalConsumer ElectronicsHDTVCD playervideo gamesvideo tape recorderprogrammable TVcameramusic systemCommunicationscellular phonevideo phonefaxmodemsPBXKK CS252 8Embedded Systems: Products - 2Control SystemsAutomotive•engine, ignition, brake systemManufacturing process control•roboticsRemote control•satellite control•spacecraft controlOther mechanical control•elevator controlOffice Equipmentsmart copierprintersmart typewritercalculatorpoint-of-sale equipment•credit-card validator•UPC code reader•cash registerMedical Applicationsinstruments: EKG, EEGscanningimagingEmbedded System implementationDSP CoreProgramROMCoefficientROMControlEMBEDDEDCORE µPOFF-THESHELF µPDSPAPPLICATIONSPECIFIC µP (ASIP)ASICSystem FUNCTIONALITYSystem FUNCTIONALITYASIP CoreProgramROMCoefficientROMControlKK CS252 10Integration boosts performance/cuts costMechanical ShutterA/DCMOS ImagerImageProcessingASIC256Kx16DRAM256Kx16DRAMMCUMemory Card I/FLCD ControlASICLCD32Kx8SRAM68-pin conn.ASICPCMCIASerialEEPROMPowerControl3.3V CR-123Lithium CellExposeUser Interface KeysActivity LEDDoorInterlockMemory CardDigital Camera hardware diagramASIC Integration OpportunityKK CS252 11Memory Dominance in StrongArmCompaq/Digital StrongARMCompaq/Digital StrongARMKK CS252 12Embedded Systems vs. General Purpose Computing - 1Embedded System•Runs a few applications often known at design time•Not end-user programmable•Operates in fixed run-time constraints, additional performance may not be useful/valuableGeneral purpose computing•Intended to run a fully general set of applications•End-user programmable• Faster is always betterKK CS252 13Embedded Systems vs. General Purpose Computing - 2Embedded System•Differentiating features:–power–cost–speed (must be predictable)General purpose computingDifferentiating features–speed (need not


View Full Document

Berkeley COMPSCI 252 - Lec05-speculation

Documents in this Course
Quiz

Quiz

9 pages

Caches I

Caches I

46 pages

Lecture 6

Lecture 6

36 pages

Lecture 9

Lecture 9

52 pages

Figures

Figures

26 pages

Midterm

Midterm

15 pages

Midterm

Midterm

14 pages

Midterm I

Midterm I

15 pages

ECHO

ECHO

25 pages

Quiz  1

Quiz 1

12 pages

Load more
Download Lec05-speculation
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lec05-speculation and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lec05-speculation 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?