Unformatted text preview:

The AMD Hammer Processor CoreChetana N. KeltcherMember of Technical StaffAdvanced Micro DevicesAug 2002 AMD Hammer Processor Core HotChips 142Hammer Architecture Overview• First x86-64 based processor• Aggressive out-of-order, 9-issuesuperscalar processor• Integrated DDR memory controller• Leading performance in integer, floatingpoint and multimedia– x86-64, x87, MMX™, 3DNow!™, SSE,SSE2L2CacheL1Instruct.CacheL1DataCacheHammerProcessorCoreHyperTransport™ technologyDDR Memory ControllerHammer ArchitectureAug 2002 AMD Hammer Processor Core HotChips 143Hammer Core OverviewL1Icache64KBFetchInt Decode & RenameµOPs36-entry FP schedulerFADD FMISCFMULBranchPrediction44-entryLoad/StoreQueueL2CacheInstruction Control Unit (72 entries)L1Dcache64KBFastpathMicrocode EngineScan/AlignFP Decode & RenameAGUALUAGUALUMULTAGUALURes Res ResCrossbarMemoryControllerHyperTransportTMSystemRequestQueueAug 2002 AMD Hammer Processor Core HotChips 144Instruction Fetch• Supply 16 instruction bytes tothe decoder per cycle• 64KB instruction cache,2-way set associative– Linearly-indexed, physically-tagged,64-byte block size– Prefetch next sequential blockon a miss• 2 sets of instruction cache tags (fetch port, snoop)• Predecode instruction– 1 end bit per-byte– Decode some branch types• Branch predictionFetchIntBranchPrediction44-entryLoad/StoreQueueL2Cache256KB –1MInstruction Control Unit (72 entries)L1Dcache64KBFastpathMicrocode EngineScan/AlignFPCrossbarMemoryControllerHyperTransportTMSystemRequestQueueL1Icache64KBAug 2002 AMD Hammer Processor Core HotChips 145Branch Prediction• Sequential Fetch• Predicted Fetch• Branch TargetAddress Calculator Fetch• Mispredicted Fetch• 5-10% improvement inprediction accuracy vs.AMD Athlon™L2 Cache BranchSelectorsEvicted DataBranchSelectorsGlobalHistoryCounter(16k, 2-bitcounters)Target Array(2k targets)12-entryReturn AddressStack (RAS)Branch TargetAddress Calculator(BTAC)ExecutionstagesPickDEC1DEC2PackEDECDispatchIssueExecuteRedirectAug 2002 AMD Hammer Processor Core HotChips 146Scan / Align• Convert x86 instructions to fixedlength µOPs• Dispatch 3 µOPs per cycle tointeger/FP schedulers• Instructions use one of twodecoding pipelines– Fastpath: instructions decoding to twoor fewer µOPs are decoded by hardware,packed into 3 dispatch positions– Microcode: x86 instructions decoding to more than two µOPs, calculate ROM entrypoint, fetch sequence from ROM• Compared to AMD Athlon™, more instructions use the fastpath– Eg: Packed SSE is microcoded in AMD Athlon and fastpath in Hammer– Hammer has 8% fewer microcoded instructions for Specint2000– Hammer has 28% fewer microcoded instructions for Specfp2000L1Icache64KBFetchIntBranchPrediction44-entryLoad/StoreQueueL2CacheInstruction Control Unit (72 entries)L1Dcache64KBFastpathMicrocode EngineScan/AlignFPCrossbarMemoryControllerHyperTransportTMSystemRequestQueueAug 2002 AMD Hammer Processor Core HotChips 147Execution Units• 3 integer units• 3 address generation units• 3 superscalar floating point units• Integer– Full 64-bit data path– 3 x 8-entry reservation stations– Single cycle 32 and 64-bit add, sub,rotate, shift, logical, etc.– 32-bit multiply: 3 cycle latency– 64-bit multiply: 5 cycle latency• Floating point– Handles x87, MMX™, 3DNow!™, SSE and SSE2– 36-entry scheduler– Out-of-order, fully pipelined designL1Icache64KBFetchIntSchedulerFAD FMIFMUBranchPrediction44-entryLoad/StoreQueueL2CacheInstruction Control Unit (72 entries)L1Dcache64KBFastpathMicrocode EngineScan/AlignFPAGUALUAGUALUMULAGUALURes Res ResCrossbarMemoryControllerHyperTransportTMSystemRequestQueueAug 2002 AMD Hammer Processor Core HotChips 148Load/Store and Data Cache• 64KB data cache– 2-way set associative– Linearly-indexed, physically-tagged– 40-bit physical address– 48-bit linear address– MOESI coherency– 64-byte block size• Banked and dual ported– 2 64-bit reads/writes each cycle to different banks• 3 sets of data cache tags (port A, port B, snoop)• Load->use latency is 3 cycles (zero segment base)– 1 extra cycle to handle misaligned (quadword boundary) loads• Data forwarding from stores to dependent loads• Hardware prefetchL1Icache64KBFetchIntBranchPrediction44-entryLoad/StoreQueueL2CacheInstruction Control Unit (72 entries)L1Dcache64KBFastpathMicrocode EngineScan/AlignFPCrossbarMemoryControllerHyperTransportTMSystemRequestQueueAug 2002 AMD Hammer Processor Core HotChips 149L2 Cache• Configurable sizes up to 1MB• 16-way set associative• L1 and L2 storage is mutuallyexclusive• Pseudo-LRU scheme to reduce thenumber of LRU bits by half• Stores IC predecode and branchprediction bits• 10 outstanding miss requests–8 DC– 2 IC• System interface– Victim Buffer (8-entry)– Snoop Buffer (8-entry)– Write Buffer (4-entry)L1Icache64KBFetchIntBranchPrediction44-entryLoad/StoreQueueL2CacheInstruction Control Unit (72 entries)L1Dcache64KBFastpathMicrocode EngineScan/AlignFPCrossbarMemoryControllerHyperTransportTMSystemRequestQueueAug 2002 AMD Hammer Processor Core HotChips 1410TLB for Large WorkloadsTLBReload24-entryPage DescriptorCachePML4, PDP, PDEL2 CacheFlush FilterCAM32 EntryCR3, PDP, PDE Snoop ModifyTable WalkPDC ReloadTLBReloadL1 Instruction TLB40 EntryFully Associative4M/2M & 4k pagesL2 Instruction TLB512-entry4-way associative4k pagesASNL1 Data TLB40 EntryFully Associative4M/2M & 4k pagesL2 Data TLB512-entry4-way associative4k pagesSignSignPML4PML4PDPPDPPDEPDEPTEPTEOffsetOffset4863 47 39 38 30 29 21 20 12 11 0Aug 2002 AMD Hammer Processor Core HotChips 1411Integrated Memory Controller• Integrated DDR memory controller– 8-byte or 16-byte interface– Unbuffered or Registered DIMMs– 16-byte interface supports direct connection to 8 registered DIMMs and chipkill ECC– Significantly reduces memory latency– Memory latency improves as CPU andHyperTransport™ link speed improves– Performance improves by approximately 20%compared to AMD Athlon™ topology– Snoop throughput scales with CPU frequency• Integrated Northbridge Functionality– Processes requests from CPU/IO to DRAM/IO– HyperTransport™ routing• peak bandwidth = 6.4GB/s– Handles transaction ordering and cachecoherence– Runs at the same frequency as CPU coreSystem RequestQueueCrossbarHyperTransportTMlinks


View Full Document

UMD ENEE 759H - The AMD Hammer Processor Core

Download The AMD Hammer Processor Core
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view The AMD Hammer Processor Core and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view The AMD Hammer Processor Core 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?