DOC PREVIEW
UW-Madison ECE/CS 752 - Pentium Pro Case Study

This preview shows page 1-2-15-16-17-32-33 out of 33 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 33 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 33 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 33 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 33 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 33 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 33 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 33 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 33 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Pentium Pro Case StudySlide 2Goals of P6 MicroarchitectureP6 – The Big PictureMemory HierarchyInstruction FetchInstruction Cache UnitBranch Target BufferBranch Prediction AlgorithmInstruction Decode - 1Instruction Decode - 2What is a uop?Instruction DispatchRegister Renaming - 1Register Renaming - ExampleChallenges to Register RenamingOut-of-Order Execution EngineReservation StationMemory Ordering Buffer (MOB)Instruction CompletionPentium Pro Design Methodology - 1Pentium Pro Performance AnalysisPerformance – Run TimesPerformance – IPC vs. uPCSlide 25Performance – Cache MissesPerformance – Branch PredictionConclusionsRetrospectiveMicroarchitectural UpdatesSlide 31Slide 32Slide 33Pentium Pro Case StudyProf. Mikko H. LipastiUniversity of Wisconsin-MadisonLecture notes based on notes by John P. ShenUpdated by Mikko LipastiPentium Pro Case Study•Microarchitecture–Order-3 Superscalar–Out-of-Order execution–Speculative execution–In-order completion•Design Methodology•Performance Analysis•RetrospectiveGoals of P6 MicroarchitectureIA-32 CompliantPerformance (Frequency - IPC)ValidationDie SizeSchedulePowerP6 – TheBig PictureMOBDCUIEU1AGU0 IEU0FaddFmulImulDivAGU101234Reservation Station (20)DispatchDecodeFetch2 Cycles4 Cycles2 CyclesBTB/ICUBAC/RenameAllocation2 CyclesROB RRFJEU(40x157)2 cycMemory Hierarchy•Level 1 instruction and data caches - 2 cycle access time•Level 2 unified cache - 6 cycle access time•Separate level 2 cache and memory address/data busICache(8KB)DCache(8Kb)BIUL2 Cache(256Kb)Main Memory PCICPU64 bit16bytesInstruction FetchCacheInst. BufInst. RotatorInst.LengthDecoderLengthInst.MarksPredictionMarksInstructionVictimICacheStreamTLBBufferPhysicalAddr.L2 Cache (256Kb)Fetch AddressNextAddr.LogicOtherFetchRequestsBranch Target Buffer (512)PredictionTo Decode16 bytes16 bytes + marksInstruction Data MuxInstruction Data(8Kb)2 cycleBranchTargetInstruction Cache UnitStream BufferICache(8 Kb)Victim CacheBusInterfaceUnitData MuxInstructionTag ArrayITLBHit/MissInstructionDataFetch AddressLower 12 bitsLower 12 bitsUpper20 bitsBranch Target BufferFetch Addr. Tag4-bit BHRBr. Offset4-bit BHR spec.Target Addr.Way 0128 SetsFetch Addr. Tag4-bit BHRBr. Offset4-bit BHR spec.Target Addr.Way 1Fetch Addr. Tag4-bit BHRBr. Offset4-bit BHR spec.Target Addr.Way 3TagComparePHT16 entries/setReturnStackPredictionControlLogicPrediction &Target Addr.FetchAddressPattern History Table (PHT) is not speculatively updatedA speculative Branch History Register (BHR) and prediction state is maintainedUses speculative prediction state if it exist for that branchBranch Prediction AlgorithmCurrent prediction updates the speculative history prior to the next instance of the branch instructionBranch History Register (BHR) is updated during branch executionBranch recovery flushes front-end and drains the execution coreBranch mis-prediction resets the speculative branch history state to match BHR0 0 1 010 1 0 1Speculative HistoryBr. History0000000100100011010001010110111011111 01...Pattern TableStateMachine010100101110Spec. Pred.Branch ExecutionBr. Pred.Instruction Decode - 1Branch instruction detectionBranch address calculation - Static prediction and branch always execution One branch decode per cycle (break on branch)Instruction Buffer16 bytesMacro-Instruction Bytes from IFUDecoder0Decoder1Decoder2BranchAddressCalc.To NextAddressCalc.4 uops 1 uop 1 uopUp to 3 uops Issued to dispatchuop Queue (6)uROMInstruction Decode - 2Instruction Buffer contains up to 16 instructions, which must be decoded and queued before the instruction buffer is re-filledMacro-instructions must shift from decoder 2 to decoder 1 to decoder 0Instruction Buffer16 bytesMacro-Instruction Bytes from IFUDecoder0Decoder1Decoder2BranchAddressCalc.To NextAddressCalc.4 uops 1 uop 1 uopUp to 3 uops Issued to dispatchuop Queue (6)uROMWhat is a uop?Small two-operand instruction - Very RISC like.IA-32 instructionadd (eax),(ebx) MEM(eax) <- MEM(eax) + MEM(ebx)Uop decomposition:ld guop0, (eax) guop0 <- MEM(eax)ld guop1, (ebx) guop1 <- MEM(ebx)add guop0,guop1 guop0 <- guop0 + guop1sta eaxstd guop0 MEM(eax) <- guop0Instruction DispatchRegister Renaming Allocation requirements“3-or-none” Reorder buffer entriesReservation station entryLoad buffer or store buffer entryDispatch buffer “probably” dispatches all 3 uops before re-fillRenamingAllocatoruoP Queue (6)Dispatch Buffer (3)MuxLogicToReservationStationRetirementInfo2 cyclesRegister Renaming - 1Similar to Tomasulo’s Algorithm - Uses ROB entry number as tagsThe register alias tables (RAT) maintain a pointer to the most recent data for the renamed registerExecution results are stored in the ROBInteger RATEAXEBXECXFloating Point RATFST0FST1FST2FST7GuoP0GuoP1Real Register File (RRF) Reorder Buffer (ROB)012345678939EAXEBXECXFST0FST1GuoP0GuoP1IuoP(0-3)CC/Events881249Register Renaming - Example© Shen, Lipasti15Integer RATEAXEBXECXFloating Point RATFST0FST1FST2FST7GuoP0GuoP1Real Register File (RRF) Reorder Buffer (ROB)012345678939EAXEBXECXFST0FST1GuoP0GuoP1IuoP(0-3)CC/Events881249Dispatching:add eax, ebxadd eax, ecxfxch f0, f1Completing:sub eax, ecxAllocComp subChallenges to Register RenamingInteger RATEAXEBXECXFloating Point RATFST0FST1FST2FST7GuoP0GuoP1Real Register File (RRF) Reorder Buffer (ROB)012345678939EAXEBXECXFST0FST1GuoP0GuoP1IuoP(0-3)CC/Events8812498-bit codemov AL, #data1mov AH, #data2add AL, #data3add AL, #data4Byte addressable registersOut-of-Order Execution Engine•In-order branch issue and execution•In-order load/store issue to address generation units•Instruction execution and result bus scheduling•Is the reservation station “truly” centralized & what is “bind ing” ?IEU1AGU0 IEU0FaddFmulImulDivAGU1MOBDCU (8Kb)01234Reservation Station (20)2 CyclesJEURSbypassReservation Station•Cycle 1–Order checking –Operand availability•Cycle 2–Writeback bus schedulingCycle 1Cycle 2To Execution UnitsPort 0Port 1Port 2Port 3Port 4From Dispatch QueueMemory Ordering Buffer (MOB)•Load buffer retains loads until completed, for coherency checking•Store forwarding out of store buffers•2 cycle latency through MOB•“Store Coloring” - Load instructions are tagged by the last storeConflictLogicLoad BufferStore DataBuffer (12)Store AddressBuffer (12)(16)Data Cache Unit (8Kb)BypassLogicControlLoad Data ResultAGU0 AGU1 R/SMOB2 cycle2 cycleInstruction Completion•Handles all exception/interrupt/trap


View Full Document

UW-Madison ECE/CS 752 - Pentium Pro Case Study

Download Pentium Pro Case Study
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Pentium Pro Case Study and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Pentium Pro Case Study 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?