UVA CS 451 - Intel Pentium 4 Processor

Unformatted text preview:

Intel Pentium 4 ProcessorOutlineIntroductionIA-32IA-32 (cont’d)Pentium III vs. Pentium 4 PipelineComparison Between Pentium3 and Pentium4Execution on MPEG4 Benchmarks @ 1 GHzInstruction Set ArchitectureSSE2Comparison Between SSE and SSE2PackingHardware Support for SSE2SSE2 Instructions (1)SSE2 Instructions (2)SSE2 Instructions (3)ConclusionInstruction StreamSlide 19Front EndPrefetchDecoderTrace CacheWhat is a Trace Cache?Pentium 4 Trace CacheMicrocode ROMBranch PredictionSlide 28Branch Target BufferReturn Address StackBranch HintsOut-of-Order ExecutionIssueExecutionExecution UnitsDouble-pumped ALUsRetirementExecution PipelineSlide 39Data Stream of Pentium 4 ProcessorRegister RenamingRegister Renaming (2)On-chip CachesL1 Instruction CacheL1 Data CacheL2 CacheData Prefetcher in L2 CacheStore and LoadStoreStore-to-Load ForwardingSystem BusSlide 52What Went WrongNo L3 cacheSmall L1 CacheLoses consistently to AMDNorthwoodSlide 58PrescottSlide 60Intel Pentium 4 ProcessorIntel Pentium 4 Processor Presented byPresented by Michele CoMichele Co(much slide content courtesy of Zhijian Lu and Steve Kelley)(much slide content courtesy of Zhijian Lu and Steve Kelley)OutlineOutlineIntroduction (Zhijian)Introduction (Zhijian)–Willamette (11/2000)Willamette (11/2000)Instruction Set Architecture (Zhijian)Instruction Set Architecture (Zhijian)Instruction Stream (Steve)Instruction Stream (Steve)Data Stream (Zhijian)Data Stream (Zhijian)What went wrong (Steve)What went wrong (Steve)Pentium 4 revisionsPentium 4 revisions–Northwood (1/2002)Northwood (1/2002)–Xeon (Prestonia, ~2002)Xeon (Prestonia, ~2002)–Prescott (2/2004)Prescott (2/2004)Dual CoreDual Core–SmithfieldSmithfieldIntroductionIntroductionIntel Pentium 4 processor Intel Pentium 4 processor –Latest IA-32 processor equipped with a full set Latest IA-32 processor equipped with a full set of IA-32 SIMD operationsof IA-32 SIMD operationsFirst implementation of a new micro-First implementation of a new micro-architecture called “NetBurst” by Intel architecture called “NetBurst” by Intel (11/2000) (11/2000)IA-32IA-32Intel architecture 32-bit (IA-32)Intel architecture 32-bit (IA-32)–80386 instruction set (1985)80386 instruction set (1985)–CISC, 32-bit addressesCISC, 32-bit addresses““Flat” memory model Flat” memory model RegistersRegisters–Eight 32-bit registersEight 32-bit registers–Eight FP stack registersEight FP stack registers–6 segment registers6 segment registersIA-32 (cont’d)IA-32 (cont’d)Addressing modesAddressing modes–Register indirect (mem[reg])Register indirect (mem[reg])–Base + displacement (mem[reg + const])Base + displacement (mem[reg + const])–Base + scaled index (mem[reg + (2Base + scaled index (mem[reg + (2scalescale x index)]) x index)])–Base + scaled index + displacement (mem[reg + (2Base + scaled index + displacement (mem[reg + (2scalescale x x index) + displacement])index) + displacement])SIMD instruction setsSIMD instruction sets–MMX (Pentium II)MMX (Pentium II)»Eight 64-bit MMX registers, integer ops onlyEight 64-bit MMX registers, integer ops only–SSE (Streaming SIMD Extension, Pentium III)SSE (Streaming SIMD Extension, Pentium III)»Eight 128-bit registersEight 128-bit registersPentium III vs. Pentium 4 PipelinePentium III vs. Pentium 4 PipelineComparison Between Pentium3 and Comparison Between Pentium3 and Pentium4Pentium4Execution on MPEG4 Benchmarks @ 1 GHzExecution on MPEG4 Benchmarks @ 1 GHzInstruction Set ArchitectureInstruction Set ArchitecturePentium4 ISA =Pentium4 ISA = Pentium3 ISA +Pentium3 ISA + SSE2 (Streaming SIMD Extensions 2)SSE2 (Streaming SIMD Extensions 2)SSE2 is an architectural enhancement to SSE2 is an architectural enhancement to the IA-32 architecturethe IA-32 architectureSSE2SSE2Extends MMX and the SSE extensions with Extends MMX and the SSE extensions with 144 new instructions:144 new instructions:128-bit SIMD integer arithmetic operations128-bit SIMD integer arithmetic operations128-bit SIMD double precision floating 128-bit SIMD double precision floating point operationspoint operationsEnhanced cache and memory management Enhanced cache and memory management operationsoperationsComparison Between SSE and SSE2Comparison Between SSE and SSE2Both support operations on 128-bit XMM register Both support operations on 128-bit XMM register SSE only supports 4 packed single-precision floating-SSE only supports 4 packed single-precision floating-point valuespoint valuesSSE2 supports more:SSE2 supports more: 2 packed double-precision floating-point values 2 packed double-precision floating-point values 16 packed byte integers16 packed byte integers 8 packed word integers8 packed word integers 4 packed doubleword integers4 packed doubleword integers 2 packed quadword integers2 packed quadword integers Double quadwordDouble quadwordPackingPacking128 bits (word = 2 bytes)128 bits (word = 2 bytes)Quad wordQuad wordDouble word Double wordDouble word Double word64 bit64 bit32 bit 32 bit 32 bit 32 bitHardware Support for SSE2Hardware Support for SSE2Adder and Multiplier units in the SSE2 Adder and Multiplier units in the SSE2 engine are 128 bits wide, twice the width of engine are 128 bits wide, twice the width of that in Pentium3that in Pentium3Increased bandwidth in load/store for Increased bandwidth in load/store for floating-point valuesfloating-point valuesload and store are 128-bit wideload and store are 128-bit wideOne load plus one store can be completed One load plus one store can be completed between XMM register and L1 cache in one between XMM register and L1 cache in one clock cycle clock cycleSSE2 Instructions (1)SSE2 Instructions (1)Data movementsData movements Move data between XMM registers and between Move data between XMM registers and between XMM registers and memoryXMM registers and memoryDouble precision floating-point operationsDouble precision floating-point operations Arithmetic instructions on both scalar and Arithmetic instructions on both scalar and packed valuespacked valuesLogical InstructionsLogical InstructionsPerform logical operations on packed double Perform logical operations on packed double precision floating-point valuesprecision floating-point valuesSSE2 Instructions (2)SSE2 Instructions (2)Compare instructionsCompare instructionsCompare packed and scalar double precision floating-Compare packed and


View Full Document

UVA CS 451 - Intel Pentium 4 Processor

Download Intel Pentium 4 Processor
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Intel Pentium 4 Processor and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Intel Pentium 4 Processor 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?