UCR CS 162 - Lecture 2: Introduction & Pipelining

Unformatted text preview:

CS 162 Computer Architecture Lecture 2: Introduction & PipeliningReview of Last ClassWhat is MultiprocessingMemory Latency ProblemHardware MultithreadingArchitectural Comparisons (cont.)PowerPoint PresentationSlide 8Slide 9Slide 10Slide 11Review: Single-cycle Datapath for MIPSStages of Execution in Pipelined MIPSPipelined Execution RepresentationDatapath Timing: Single-cycle vs. PipelinedPipelining LessonsSlide 17Required Changes to DatapathChanges to Datapath Contd.Pipelined Datapath (with Pipeline Regs)(6.2)11999 ©UCBCS 162 Computer Architecture Lecture 2: Introduction & PipeliningInstructor: L.N. Bhuyanwww.cs.ucr.edu/~bhuyan/cs16221999 ©UCBReview of Last Class°MIPS Datapath°Introduction to Pipelining°Introduction to Instruction Level Parallelism (ILP)°Introduction to VLIW31999 ©UCBWhat is Multiprocessing°Parallelism at the Instruction Level is limited because of data dependency => Speed up is limited!!°Abundant availability of program level parallelism, like Do I = 1000, Loop Level Parallelism. How about employing multiple processors to execute the loops => Parallel processing or Multiprocessing°With billion transistors on a chip, we can put a few CPUs in one chip => Chip multiprocessor41999 ©UCBMemory Latency ProblemEven if we increase CPU power, memory is the real bottleneck. Techniques to alleviate memory latency problem:1. Memory hierarchy – Program locality, cache memory, multilevel, pages and context switching2. Prefetching – Get the instruction/data before the CPU needs. Good for instns because of sequential locality, so all modern processors use prefetch buffers for instns. What do with data?3. Multithreading – Can the CPU jump to another program when accessing memory? It’s like multiprogramming!!51999 ©UCBHardware Multithreading°We need to develop a hardware multithreading technique because switching between threads in software is very time-consuming (Why?), so not suitable for main memory (instead of I/O) access, Ex: Multitasking °Develop multiple PCs and register sets on the CPU so that thread switching can occur without having to store the register contents in main memory (stack, like it is done for context switching).°Several threads reside in the CPU simultaneously, and execution switches between the threads on main memory access.°How about both multiprocessors and multithreading on a chip? => Network Processor61999 ©UCBArchitectural Comparisons (cont.)Time (processor cycle)Superscalar Fine-Grained Coarse-GrainedMultiprocessingThread 1Thread 2Thread 3Thread 4Thread 5Idle slotSimultaneousMultithreading71999 ©UCBIntel IXP1200 Network ProcessorInitial component of the Intel Exchange Architecture - IXA Each micro engine is a 5-stage pipeline – no ILP, 4-way multithreaded 7 core multiprocessing – 6 Micro engines and a Strong Arm Core166 MHz fundamental clock rate Intel claims 2.5 Mpps IP routing for 64 byte packets Already the most widely used NPU Or more accurately the most widely admitted use81999 ©UCBIXP1200 Chip LayoutStrongARM processing core Microengines introduce new ISA I/O PCI SDRAM SRAM IX : PCI-like packet bus On chip FIFOs 16 entry 64B each91999 ©UCBIXP1200 Microengine4 hardware contexts Single issue processor Explicit optional context switch on SRAM access Registers All are single ported Separate GPR 1536 registers total 32-bit ALU Can access GPR or XFER registers Standard 5 stage pipe 4KB SRAM instruction store – not a cache!101999 ©UCBIntel IXP2400 Microengine (New)XScale core replaces StrongARM 1.4 GHz target in 0.13-micron Nearest neighbor routes added between microengines Hardware to accelerate CRC operations and Random number generation 16 entry CAM111999 ©UCBMIPS Pipeline Chapter 6 CS 161 Text121999 ©UCBReview: Single-cycle Datapath for MIPSDataMemory(Dmem)PC RegistersALUInstructionMemory(Imem)Stage 1 Stage 2 Stage 3 Stage 4Stage 5IFtch Dcd Exec Mem WB°Use datapath figure to represent pipelineALU IMReg DM Reg131999 ©UCBStages of Execution in Pipelined MIPS5 stage instruction pipeline 1) I-fetch: Fetch Instruction, Increment PC2) Decode: Instruction, Read Registers3) Execute: Mem-reference: Calculate Address R-format: Perform ALU Operation4) Memory: Load: Read Data from Data Memory Store: Write Data to Data Memory5) Write Back: Write Data to Register141999 ©UCBPipelined Execution Representation°To simplify pipeline, every instruction takes same number of steps, called stages°One clock cycle per stageIFtch Dcd Exec Mem WBIFtch Dcd Exec Mem WBIFtch Dcd Exec Mem WBIFtch Dcd Exec Mem WBIFtch Dcd Exec Mem WBProgram FlowTime151999 ©UCBDatapath Timing: Single-cycle vs. Pipelined°Assume the following delays for major functional units:•2 ns for a memory access or ALU operation•1 ns for register file read or write°Total datapath delay for single-cycle:°In pipeline machine, each stage = length of longest delay = 2ns; 5 stages = 10nsInsn Insn Reg ALU Data Reg TotalType Fetch Read Oper Access Write Timebeq 2ns 1ns 2ns 5nsR-form 2ns 1ns 2ns 1ns 6nssw 2ns 1ns 2ns 2ns 7nslw 2ns 1ns 2ns 2ns 1ns 8ns161999 ©UCBPipelining Lessons°Pipelining doesn’t help latency (execution time) of single task, it helps throughput of entire workload°Multiple tasks operating simultaneously using different resources°Potential speedup = Number of pipe stages°Time to “fill” pipeline and time to “drain” it reduces speedup°Pipeline rate limited by slowest pipeline stage°Unbalanced lengths of pipe stages also reduces speedup171999 ©UCBSingle Cycle Datapath (From Ch 5)RegsReadReg1Readdata1ALUReaddata2ReadReg2WriteRegWriteDataZeroALU-con RegWriteAddressReaddataWriteData SignExtendDmem MemRead MemWriteMuxMemTo-RegMuxRead AddrInstruc-tionImem4PCaddadd << 2Mux PCSrcALUOpALU-srcMux25:2120:1615:11RegDst15:031:0181999 ©UCBRequired Changes to Datapath°Introduce registers to separate 5 stages by putting IF/ID, ID/EX, EX/MEM, and MEM/WB registers in the datapath.°Next PC value is computed in the 3rd step, but we need to bring in next instn in the next cycle – Move PCSrc Mux to 1st stage. The PC is incremented unless there is a new branch address.°Branch address is computed in 3rd stage. With pipeline, the PC value has changed! Must carry the PC value along with instn. Width of IF/ID register = (IR)+(PC) = 64 bits.191999 ©UCBChanges to Datapath Contd.°For lw instn, we need write register address at stage 5. But the IR


View Full Document

UCR CS 162 - Lecture 2: Introduction & Pipelining

Download Lecture 2: Introduction & Pipelining
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture 2: Introduction & Pipelining and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 2: Introduction & Pipelining 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?