Multithreaded Microprocessors and Multiprocessor SoCsTopicsSlide 3Why Consider MP on Chip?Slide 5Simultaneous MTMultithreaded Usage ModelsWhat Can Be Shared, at What Cost?Athlon64 Die PhotoProliferation of Context ArbitrationWhy Simultaneous MT, Then?Slide 12Alternative: The Fast Context SwitchAlternative: Multiprocessor-on-ChipMainstream CPU of 2008/2009 (45nm)Alternative: Heterogeneous MPSlide 17BYOD – Bring Your Own DefinitionSam SandboteSam SandboteCSE 8383 Advanced Computer ArchitectureCSE 8383 Advanced Computer ArchitectureMultithreaded MicroprocessorsMultithreaded Microprocessorsand Multiprocessor SoCsand Multiprocessor SoCsSam SandboteSam SandboteCSE 8383 Advanced Computer ArchitectureCSE 8383 Advanced Computer ArchitectureFebruary 23, 2006February 23, 2006Sam SandboteSam SandboteCSE 8383 Advanced Computer ArchitectureCSE 8383 Advanced Computer ArchitectureTopicsTopics1.1.Technical DriversTechnical Drivers2.2.Simultaneous MultithreadingSimultaneous Multithreading3.3.Alternative PerspectivesAlternative Perspectives4.4.What is an SoC, Anyway?What is an SoC, Anyway?Sam SandboteSam SandboteCSE 8383 Advanced Computer ArchitectureCSE 8383 Advanced Computer ArchitectureTopicsTopics1.1.Technical DriversTechnical Drivers2.2.Simultaneous MultithreadingSimultaneous Multithreading3.3.Alternative PerspectivesAlternative Perspectives4.4.What is an SoC, Anyway?What is an SoC, Anyway?Sam SandboteSam SandboteCSE 8383 Advanced Computer ArchitectureCSE 8383 Advanced Computer ArchitectureWhy Consider MP on Chip?Why Consider MP on Chip?The industry does not fundamentally change unless it is The industry does not fundamentally change unless it is forced against a wallforced against a wallWe would prefer to scale as we always have, if we couldMost programmers are not skilled in the art of parallel programmingConfluence of 3 trends has forced the industry to go MPConfluence of 3 trends has forced the industry to go MP1. Architectural tricks to speed up single programs have limits•Locality of reference (cache size)•ILP (superscalar issue width, window size)2. Building faster clocked logic is getting exponentially harder3. Process tech still shrinking designs… must “use” that area!Sam SandboteSam SandboteCSE 8383 Advanced Computer ArchitectureCSE 8383 Advanced Computer ArchitectureTopicsTopics1.1.Technical DriversTechnical Drivers2.2.Simultaneous MultithreadingSimultaneous Multithreading3.3.Alternative PerspectivesAlternative Perspectives4.4.What is an SoC, Anyway?What is an SoC, Anyway?Sam SandboteSam SandboteCSE 8383 Advanced Computer ArchitectureCSE 8383 Advanced Computer ArchitectureSimultaneous MTSimultaneous MTConcept: multiplex the execution of 2 or more threadsConcept: multiplex the execution of 2 or more threadsEach maintains its own architectural register statePC, R0-Rn, SP, CC, etc – these are maintained per-threadWhat happens when we mix two instruction streams?What happens when we mix two instruction streams?They are guaranteed not to have any data dependencies between them•Even for memory addresses!•Only register dependencies are considered by out-of-order machinesConceptually, available ILP is doubledWe have enough unrelated instructions from a second thread to fill in pipeline bubbles left by the firstSam SandboteSam SandboteCSE 8383 Advanced Computer ArchitectureCSE 8383 Advanced Computer ArchitectureMultithreaded Usage ModelsMultithreaded Usage ModelsCoarse: Application-Level ParallelismCoarse: Application-Level ParallelismEach context corresponds to a process under OS controlMake the OS believe two processors existStill hard to implement: Intel took 2 years to get the bugs out of Pentium 4 HTFine: Native Multithreaded ISAFine: Native Multithreaded ISAConstructs fork, join, quit are machine instructionsWhat happens when we fork more threads than hardware supports?Ultra-Fine: Well, basically same as ILPUltra-Fine: Well, basically same as ILPSam SandboteSam SandboteCSE 8383 Advanced Computer ArchitectureCSE 8383 Advanced Computer ArchitectureWhat Can Be Shared, at What Cost?What Can Be Shared, at What Cost?ResourceResourceImpact on Single-Impact on Single-Thread PerformanceThread PerformanceNotesNotesFetch Fetch BandwidthBandwidthHighHighInstruction Instruction CacheCacheMediumMediumMust support hit-under-missMust support hit-under-missBranch Branch Predictor StatePredictor StateMediumMediumExec UnitsExec UnitsNoneNoneSmall and cheap to replicateSmall and cheap to replicateData CacheData CacheVery HighVery HighMust support hit-under-missMust support hit-under-missSam SandboteSam SandboteCSE 8383 Advanced Computer ArchitectureCSE 8383 Advanced Computer ArchitectureAthlon64 Die PhotoAthlon64 Die PhotoSam SandboteSam SandboteCSE 8383 Advanced Computer ArchitectureCSE 8383 Advanced Computer ArchitectureProliferation of Context ArbitrationProliferation of Context ArbitrationSharing implies:Sharing implies:Programmer declares a QoS for a thread upon its startupThis QoS must be distributedArbitration must exist for:Arbitration must exist for:Fetch bandwidthDispatch and/or IssueCache and/or Branch Predictor Utilization•This is a very good area for researchExternal Access BW/latencyAdditional Pipeline Cycles for Arbitration IntroducedAdditional Pipeline Cycles for Arbitration IntroducedThis is BAD!Sam SandboteSam SandboteCSE 8383 Advanced Computer ArchitectureCSE 8383 Advanced Computer ArchitectureWhy Simultaneous MT, Then?Why Simultaneous MT, Then?Most efficient in terms of aggregate IPCMost efficient in terms of aggregate IPCConsider 4 threads each with a typical instruction mixConsider 4 threads each with a typical instruction mix20% loads, 10% stores20% branching50% in-CPU instructions ADD, MOV, etc.Using 4 superscalar speculative processorsUsing 4 superscalar speculative processors4 processors, each IPC around 0.8Using a 4-way multithreaded processorsUsing a 4-way multithreaded processors1 (larger) processor with IPC 1.4 or betterSam SandboteSam SandboteCSE 8383 Advanced Computer ArchitectureCSE 8383 Advanced Computer ArchitectureTopicsTopics1.1.Technical DriversTechnical Drivers2.2.Simultaneous MultithreadingSimultaneous Multithreading3.3.Alternative PerspectivesAlternative Perspectives4.4.What is an SoC, Anyway?What is an SoC, Anyway?Sam SandboteSam SandboteCSE 8383 Advanced Computer ArchitectureCSE 8383 Advanced Computer ArchitectureAlternative: The
View Full Document