UVA CS 451 - Intel Multimedia Extensions and Hyper-Threading

Unformatted text preview:

Intel Multimedia Extensions and Hyper-ThreadingOutlineSlide 3X87 FPUx87 FPU StateX87 Data Typesx87 InstructionsMMXSIMD ExecutionMMX StateMMX RegistersMMX Data TypesMMX InstructionsSSESSE StateXMM RegistersSSE Data TypeSSE InstructionsPacked Single-Precision FP OperationScalar Single-Precision FP OperationShuffleUnpack and InterleaveSSE2SSE2 StateSSE2 Data TypesSSE2 InstructionsPacked Double-Precision FP OperationsScalar Double-Precision FP OperationsSSE3Asymmetric ProcessingHorizontal Data MovementHyper-ThreadingTerminologyHyper-threadingTechniques for Minimizing Effect of Long LatencyIntel Hyper-Threading DemoResource Requirements for HTHyper-Threading GoalsFrontend ChangesTrace Cache HitTrace Cache MissHyper-threaded ExecutionExecution ModesHT Performance - OLTPHT Performance – Web ServerIntel Multimedia ExtensionsandHyper-ThreadingMichele CoCS451Outline•Evolution of Intel multimedia extensions–x87 (386)–MMX (Pentium MMX, Pentium II)–SSE (Pentium III)–SSE2 (Pentium 4 – Willamette)–SSE3 (Pentium 4 – Prescott)•Hyper-ThreadingX87 FPU•8 80-bit data registers (double extended precision floating point)•Data registers treated as a stack•Control register – FP precision, rounding, …•Status register – FPU busy, TOS, CC, error, exception, …•Tag register- (2 bits) valid, zero, special, empty•Last instruction pointer register•Last data (operand) pointer register•Opcode registerx87 FPU StateX87 Data Typesx87 Instructions•Data transfer (load, store, move)•Basic arithmetic•Comparison•Transcendental (trigonometric, log, exp)•Load constant•x87 FPU controlMMX•SIMD execution•8 64-bit data registers (MMX)–Aliased to x87 FPU registers•Randomly accessibleSIMD ExecutionMMX StateMMX RegistersMMX Data TypesMMX Instructions•Data transfer•Arithmetic•Comparison•Conversion•Unpacking•Logical•Shift•Empty MMX stateSSE•Pentium III•8 128-bit data registers (XMM)–Independent of x87 FPU and MMX registers•SSE instructions can be executed in parallel with MMX/x87•MXCSR register – control and status for XMM registers (similar to x87 status register)•EFLAGS register – results of compare ops•128-bit packed single-precision fp data type•Prefetching, cacheability, store ordering control instructionsSSE StateXMM RegistersSSE Data TypeSSE Instructions•Packed and scalar single-precision floating point•Logical•Conversion•64-bit SIMD integer•MXCSR management•State management•Cacheability control, prefetch, memory ordering–SFENCE (store fence)•FXSAVE, FXRSTORE –extension of x87 fast save and restore of x87, MMX registers to also include save/restore of XMM, MXCSR registersPacked Single-Precision FP OperationScalar Single-Precision FP OperationShuffleUnpack and InterleaveSSE2•Pentium 4•More data types•More instructions to support new data typesSSE2 StateSSE2 Data TypesSSE2 Instructions•Support for additional types•CLFLUSH (cache line flush)•LFENCE (load fence)•MFENCE (load + store fence)Packed Double-Precision FP OperationsScalar Double-Precision FP OperationsSSE3•Pentium 4 (Prescott)–Support for Hyper-Threading•13 new instructions–10 SIMD support instructions–1 x87 accelerating instruction (fp to int conversion)–Synchronization of threads•MONITOR (monitor write-back stores)•MWAIT (wait for write-back store)•No new stateAsymmetric ProcessingHorizontal Data MovementHyper-ThreadingTerminology•Process–Program associated with a context (state: registers, program counter, flags, etc.)–Consists of one or more threads•Thread–“lightweight process” (less state)Hyper-threading•Single physical processor appears as 2 logical processors•Thread Level Parallelism (TLP)–Many applications have software threads that can be executed simultaneously•Online transaction processing•Web services•Latency can leave execution units idle–Cache misses–Branch mispredictions–Waiting for loads/storesTechniques for Minimizing Effect of Long Latency•Chip multiprocessing (CMP)–2 processors on single die–Larger than single core chip, manufacture more expensive•Time-slice or switch-on-event multithreading–Switch threads after fixed time period or on long latency events like cache misses–Doesn’t take advantage of other sources of inefficient resource usage (branch mispredictions, instruction dependencies, etc.)•Simultaneous multithreading (SMT)–Multiple threads execute on single processor without switching–Hyper-Threading is Intel’s implementationIntel Hyper-Threading DemoResource Requirements for HTNeed to maintain 2 contexts•Replicated–Register renaming logic (RAT)–Instruction Pointer –ITLB –Return stack predictor –Various other architectural registers (GP, control, APIC, machine state)•Partitioned–Re-order buffers (ROBs) –Load/Store buffers –Various queues, like the scheduling queues, uop queue, etc.•Shared–Caches: trace cache, L1, L2, L3, microcode ROM–Microarchitectural registers –Execution UnitsHyper-Threading Goals•Minimize die area cost for implementing•Ensure forward progress by at least one logical processor•Maintain single-threaded performanceFrontend Changes•2 PCs•Arbitration for shared resource access–Trace cache, microcode ROM, caches–One logical processor at a time per structure•Thread tags per trace cache entry•Microcode ROM – 2 microcode instruction pointers•Wider pipeline latches to hold state for 2 contexts•Branch prediction–RAS and branch history buffer duplicated–Global history shared, but tagged with logical processor IDTrace Cache HitTrace Cache MissHyper-threaded ExecutionExecution Modes•Single-task (ST), Multi-task (MT)–ST0, ST1 –HALT: transitions ST modes depending on logical processor executing–Interrupt sent to halted processor transitions to MTHT Performance - OLTPHT Performance – Web


View Full Document

UVA CS 451 - Intel Multimedia Extensions and Hyper-Threading

Download Intel Multimedia Extensions and Hyper-Threading
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Intel Multimedia Extensions and Hyper-Threading and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Intel Multimedia Extensions and Hyper-Threading 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?