SMU CSE 8383 - Multithreaded Microprocessors and Multiprocessor SoCs

Unformatted text preview:

Multithreaded Microprocessors and Multiprocessor SoCsTopicsSlide 3Why Consider MP on Chip?Slide 5Simultaneous MTMultithreaded Usage ModelsWhat Can Be Shared, at What Cost?Athlon64 Die PhotoProliferation of Context ArbitrationWhy Simultaneous MT, Then?Slide 12Alternative: The Fast Context SwitchAlternative: Multiprocessor-on-ChipMainstream CPU of 2008/2009 (45nm)Alternative: Heterogeneous MPSlide 17BYOD – Bring Your Own DefinitionSam SandboteSam SandboteCSE 8383 Advanced Computer ArchitectureCSE 8383 Advanced Computer ArchitectureMultithreaded MicroprocessorsMultithreaded Microprocessorsand Multiprocessor SoCsand Multiprocessor SoCsSam SandboteSam SandboteCSE 8383 Advanced Computer ArchitectureCSE 8383 Advanced Computer ArchitectureFebruary 23, 2006February 23, 2006Sam SandboteSam SandboteCSE 8383 Advanced Computer ArchitectureCSE 8383 Advanced Computer ArchitectureTopicsTopics1.1.Technical DriversTechnical Drivers2.2.Simultaneous MultithreadingSimultaneous Multithreading3.3.Alternative PerspectivesAlternative Perspectives4.4.What is an SoC, Anyway?What is an SoC, Anyway?Sam SandboteSam SandboteCSE 8383 Advanced Computer ArchitectureCSE 8383 Advanced Computer ArchitectureTopicsTopics1.1.Technical DriversTechnical Drivers2.2.Simultaneous MultithreadingSimultaneous Multithreading3.3.Alternative PerspectivesAlternative Perspectives4.4.What is an SoC, Anyway?What is an SoC, Anyway?Sam SandboteSam SandboteCSE 8383 Advanced Computer ArchitectureCSE 8383 Advanced Computer ArchitectureWhy Consider MP on Chip?Why Consider MP on Chip?The industry does not fundamentally change unless it is The industry does not fundamentally change unless it is forced against a wallforced against a wallWe would prefer to scale as we always have, if we couldMost programmers are not skilled in the art of parallel programmingConfluence of 3 trends has forced the industry to go MPConfluence of 3 trends has forced the industry to go MP1. Architectural tricks to speed up single programs have limits•Locality of reference (cache size)•ILP (superscalar issue width, window size)2. Building faster clocked logic is getting exponentially harder3. Process tech still shrinking designs… must “use” that area!Sam SandboteSam SandboteCSE 8383 Advanced Computer ArchitectureCSE 8383 Advanced Computer ArchitectureTopicsTopics1.1.Technical DriversTechnical Drivers2.2.Simultaneous MultithreadingSimultaneous Multithreading3.3.Alternative PerspectivesAlternative Perspectives4.4.What is an SoC, Anyway?What is an SoC, Anyway?Sam SandboteSam SandboteCSE 8383 Advanced Computer ArchitectureCSE 8383 Advanced Computer ArchitectureSimultaneous MTSimultaneous MTConcept: multiplex the execution of 2 or more threadsConcept: multiplex the execution of 2 or more threadsEach maintains its own architectural register statePC, R0-Rn, SP, CC, etc – these are maintained per-threadWhat happens when we mix two instruction streams?What happens when we mix two instruction streams?They are guaranteed not to have any data dependencies between them•Even for memory addresses!•Only register dependencies are considered by out-of-order machinesConceptually, available ILP is doubledWe have enough unrelated instructions from a second thread to fill in pipeline bubbles left by the firstSam SandboteSam SandboteCSE 8383 Advanced Computer ArchitectureCSE 8383 Advanced Computer ArchitectureMultithreaded Usage ModelsMultithreaded Usage ModelsCoarse: Application-Level ParallelismCoarse: Application-Level ParallelismEach context corresponds to a process under OS controlMake the OS believe two processors existStill hard to implement: Intel took 2 years to get the bugs out of Pentium 4 HTFine: Native Multithreaded ISAFine: Native Multithreaded ISAConstructs fork, join, quit are machine instructionsWhat happens when we fork more threads than hardware supports?Ultra-Fine: Well, basically same as ILPUltra-Fine: Well, basically same as ILPSam SandboteSam SandboteCSE 8383 Advanced Computer ArchitectureCSE 8383 Advanced Computer ArchitectureWhat Can Be Shared, at What Cost?What Can Be Shared, at What Cost?ResourceResourceImpact on Single-Impact on Single-Thread PerformanceThread PerformanceNotesNotesFetch Fetch BandwidthBandwidthHighHighInstruction Instruction CacheCacheMediumMediumMust support hit-under-missMust support hit-under-missBranch Branch Predictor StatePredictor StateMediumMediumExec UnitsExec UnitsNoneNoneSmall and cheap to replicateSmall and cheap to replicateData CacheData CacheVery HighVery HighMust support hit-under-missMust support hit-under-missSam SandboteSam SandboteCSE 8383 Advanced Computer ArchitectureCSE 8383 Advanced Computer ArchitectureAthlon64 Die PhotoAthlon64 Die PhotoSam SandboteSam SandboteCSE 8383 Advanced Computer ArchitectureCSE 8383 Advanced Computer ArchitectureProliferation of Context ArbitrationProliferation of Context ArbitrationSharing implies:Sharing implies:Programmer declares a QoS for a thread upon its startupThis QoS must be distributedArbitration must exist for:Arbitration must exist for:Fetch bandwidthDispatch and/or IssueCache and/or Branch Predictor Utilization•This is a very good area for researchExternal Access BW/latencyAdditional Pipeline Cycles for Arbitration IntroducedAdditional Pipeline Cycles for Arbitration IntroducedThis is BAD!Sam SandboteSam SandboteCSE 8383 Advanced Computer ArchitectureCSE 8383 Advanced Computer ArchitectureWhy Simultaneous MT, Then?Why Simultaneous MT, Then?Most efficient in terms of aggregate IPCMost efficient in terms of aggregate IPCConsider 4 threads each with a typical instruction mixConsider 4 threads each with a typical instruction mix20% loads, 10% stores20% branching50% in-CPU instructions ADD, MOV, etc.Using 4 superscalar speculative processorsUsing 4 superscalar speculative processors4 processors, each IPC around 0.8Using a 4-way multithreaded processorsUsing a 4-way multithreaded processors1 (larger) processor with IPC 1.4 or betterSam SandboteSam SandboteCSE 8383 Advanced Computer ArchitectureCSE 8383 Advanced Computer ArchitectureTopicsTopics1.1.Technical DriversTechnical Drivers2.2.Simultaneous MultithreadingSimultaneous Multithreading3.3.Alternative PerspectivesAlternative Perspectives4.4.What is an SoC, Anyway?What is an SoC, Anyway?Sam SandboteSam SandboteCSE 8383 Advanced Computer ArchitectureCSE 8383 Advanced Computer ArchitectureAlternative: The


View Full Document

SMU CSE 8383 - Multithreaded Microprocessors and Multiprocessor SoCs

Download Multithreaded Microprocessors and Multiprocessor SoCs
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Multithreaded Microprocessors and Multiprocessor SoCs and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Multithreaded Microprocessors and Multiprocessor SoCs 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?