Berkeley COMPSCI 152 - Advanced Processors I - D2361436

Home> Schools> University of California, Berkeley> Computer Science (COMPSCI) > COMPSCI 152> Advanced Processors I

DOC PREVIEW

Berkeley COMPSCI 152 - Advanced Processors I

School name University of California, Berkeley

Course Compsci 152- Computer Architecture and Engineering

Pages 23

This preview shows page 1-2-22-23 out of 23 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 23 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 23 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 23 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 23 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 23 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

UC Regents Fall 2005 © UCBCS 152 L17: Advanced Processors I2005-10-27John Lazzaro (www.cs.berkeley.edu/~lazzaro)CS 152 Computer Architecture and EngineeringLecture 17 – Advanced Processors Iwww-inst.eecs.berkeley.edu/~cs152/TAs: David Marquardt and Udam SainiUC Regents Fall 2005 © UCBCS 152 L17: Advanced Processors ILast Time: Error Correcting CodesCosmic ray hit D1. But how do we know that?D₃D₂D₁P₂D₀P₁P₀On readout we compute:P₀ xor D₃ xor D₁ xor D₀ = 1 xor 0 xor 0 xor 0 = 1 P₁ xor D₃ xor D₂ xor D₀ = 1 xor 0 xor 1 xor 0 = 0P₂ xor D₃ xor D₂ xor D₁ = 0 xor 0 xor 1 xor 0 = 10 11 0 0 1 1We write:D₃D₂D₁P₂D₀P₁P₀0 01 0 0 1 1Later, we read:P₂P₁P₀ = b101 = 5What does “5” mean?0 01 0 0 1 1The position of the flipped bit!To repair, just flip it back ...D₃D₂D₁P₂D₀P₁P₀1436 57 2Note: we number the least significant bit with 1, not 0! 0 is reserved for “no errors”.UC Regents Fall 2005 © UCBCS 152 L17: Advanced Processors IToday: Beyond the 5-stage pipelineTaxonomy: Introduction to advanced processor techniques.Superpipelining: Increasing the number of pipeline stages.Superscalar: Issuing several instructions in a single cycle.UC Regents Fall 2005 © UCBCS 152 L17: Advanced Processors I5 Stage Pipeline: A point of departureCS 152 L10 Pipeline Intro (9) Fall 2004 © UC RegentsGraphically Representing MIPS PipelineCan help with answering questions like:how many cycles does it take to execute this code?what is the ALU doing during cycle 4?is there a hazard, why does it occur, and how can it be fixed?ALUIMRegDM RegSecondsProgram InstructionsProgram=SecondsCycle InstructionCyclesAt best, the 5-stage pipeline executes one instruction per clock, with a clock period determined by the slowest stageFilling all delay slots(branch,load)Perfect cachingApplication does not need multi-cycle instructions (multiply, divide, etc)UC Regents Fall 2005 © UCBCS 152 L17: Advanced Processors ISuperpipelining: Add more stagesToday!SecondsProgram InstructionsProgram=SecondsCycle InstructionCyclesGoal: Reduce critical path byadding more pipeline stages.Difficulties: Added penalties for load delays and branch misses.Ultimate Limiter: As logic delay goes to 0, FF clk-to-Q and setup. 1600 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 11, NOVEMBER 2001Fig. 1. Process SEM cross section.The process was raised from [1] to limit standby power.Circuit design and architectural pipelining ensure low voltageperformance and functionality. To further limit standby currentin handheld ASSPs, a longer poly target takes advantage of theversus dependence and source-to-body bias is usedto electrically limit transistor in standby mode. All corenMOS and pMOS transistors utilize separate source and bulkconnections to support this. The process includes cobalt disili-cide gates and diffusions. Low source and drain capacitance, aswell as 3-nm gate-oxide thickness, allow high performance andlow-voltage operation.III. ARCHITECTUREThe microprocessor contains 32-kB instruction and datacaches as well as an eight-entry coalescing writeback buffer.The instruction and data cache fill buffers have two and fourentries, respectively. The data cache supports hit-under-missoperation and lines may be locked to allow SRAM-like oper-ation. Thirty-two-entry fully associative translation lookasidebuffers (TLBs) that support multiple page sizes are providedfor both caches. TLB entries may also be locked. A 128-entrybranch target buffer improves branch performance a pipelinedeeper than earlier high-performance ARM designs [2], [3].A. Pipeline OrganizationTo obtain high performance, the microprocessor core utilizesa simple scalar pipeline and a high-frequency clock. In additionto avoiding the potential power waste of a superscalar approach,functional design and validation complexity is decreased at theexpense of circuit design effort. To avoid circuit design issues,the pipeline partitioning balances the workload and ensures thatno one pipeline stage is tight. The main integer pipeline is sevenstages, memory operations follow an eight-stage pipeline, andwhen operating in thumb mode an extra pipe stage is insertedafter the last fetch stage to convert thumb instructions into ARMinstructions. Since thumb mode instructions [11] are 16 b, twoinstructions are fetched in parallel while executing thumb in-structions. A simplified diagram of the processor pipeline isFig. 2. Microprocessor pipeline organization.shown in Fig. 2, where the state boundaries are indicated bygray. Features that allow the microarchitecture to achieve highspeed are as follows.The shifter and ALU reside in separate stages. The ARM in-struction set allows a shift followed by an ALU operation in asingle instruction. Previous implementations limited frequencyby having the shift and ALU in a single stage. Splitting this op-eration reduces the critical ALU bypass path by approximately1/3. The extra pipeline hazard introduced when an instruction isimmediately followed by one requiring that the result be shiftedis infrequent.Decoupled Instruction Fetch. A two-instruction deep queue isimplemented between the second fetch and instruction decodepipe stages. This allows stalls generated later in the pipe to bedeferred by one or more cycles in the earlier pipe stages, therebyallowing instruction fetches to proceed when the pipe is stalled,and also relieves stall speed paths in the instruction fetch andbranch prediction units.Deferred register dependency stalls. While register depen-dencies are checked in the RF stage, stalls due to these hazardsare deferred until the X1 stage. All the necessary operands arethen captured from result-forwarding busses as the results arereturned to the register file.One of the major goals of the design was to minimize the en-ergy consumed to complete a given task. Conventional wisdomhas been that shorter pipelines are more efficient due to re-Example: 8-stage ARM XScale:extra IF, ID, data cache stages.Also, power!UC Regents Fall 2005 © UCBCS 152 L17: Advanced Processors ISecondsProgram InstructionsProgram=SecondsCycle InstructionCyclesGoal: Improve CPI by issuing several instructions per cycle.Difficulties: Load and branchdelays affect more instructions.Ultimate Limiter: Programs maybe a poor match to issue rules.!"#$%&!"#$%&'"()*+,-*.,,/012.3-*4++556789($:;9*<9:$*=)'"'($%":#$:(#>8#?>8#?.*(?( .*(?(+(?( +(?(

View Full Document

Berkeley COMPSCI 152 - Advanced Processors I

Sign up for free to view:

This document and 3 million+ documents and flashcards
High quality study guides, lecture notes, practice exams
Course Packets handpicked by editors offering a comprehensive review of your courses
Better Grades Guaranteed


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-22-23 out of 23 pages.

Berkeley COMPSCI 152 - Advanced Processors I

Sign up for free to view:

Please select your school