Berkeley COMPSCI 152 - The La Valium Processor - D1134275

Home> Schools> University of California, Berkeley> Computer Science (COMPSCI) > COMPSCI 152> The La Valium Processor

DOC PREVIEW

Berkeley COMPSCI 152 - The La Valium Processor

School name University of California, Berkeley

Course Compsci 152- Computer Architecture and Engineering

Pages 16

This preview shows page 1-2-3-4-5 out of 16 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 16 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 16 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 16 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 16 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 16 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 16 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Clock rate: 54 MhzThe La Valium Processor A CS152 Final ProjectBerkeley, CA12/9/99By:Nikhil AcharyaJohn LooSam WuEugenia ChienWest Yuet SuenTA:Victor Wen1Table Of Contents:Introduction and Summary……… pg. 3Feature Descriptions……………… pg. 4Performance Summary…………… pg. 13Critical Path………………… pg. 13Performance Analysis… …… pg. 15Testing Philosophy………………… pg. 15Appendix: (Please refer to supplemental file, “TheSecondFile.ps” for bulleted items below.)- Test Programs- VHDL- Online Logs- References for branch predictionSchematics: Please refer to supplemental files, “<schematic_name>.ps” => there will be 18 of these postscript file.NOTE: Please find submitted, the following files:- Lab7_writeup.doc- appendix.zip => this contains all 18 <schematic_name>.ps files And “TheSecondFile.ps”2Introduction and Summary:What Did We Do?The main features of our final processor include a 5-stage 2-way Super-ScalarDatapath, a 2-level Branch Prediction Unit, and an optimized memory sub-system. TheBranch Prediction Unit consists of a 2-bit Up-Down Saturation counter PHT, gSelectIndexing, and a PC Indexed Target Buffer. The memory sub-system was optimized byenabling Instruction Prefetching using a 4 word Stream Buffer, incorporating arandomized cache replacement policy, and by modifying our DRAM controller to returncontrol to the processor as soon as the data is returned from the memory. Please refer tothe “Feature Descriptions” section of our report for an in depth description of the featuresjust listed.Top-Level Block Diagram of The La Valium Processor:Spring 99 Mystery Program Performance Summary - How it ran on La Valium:3OptimizedMemorySub-SystemInstructionsDataPipeline UPipeline VSuper-Scalar Processor CoreBranchPredictionUnitWe present here just our processor’s performance statistics (when running lastsemester’s Mystery Program), since an in depth performance analysis follows later in thereport:- Clock rate: 54 Mhz- Execution time: 380.5-s = 20,569 cycles- CPI: 3.5- Paired instructions: 2,642- Unpaired instructions: 3,197- D-Cache stalls: 4,750 cycles- I-Cache stalls: 11,844 cyclesWe have run the “merge_sort.s” program on our processor as well, and it yielded resultsthat were very similar to those for the Spring 99 final mystery program (enumeratedabove).Feature Descriptions:Super-Scalar Processor Core:Motivation for superscalar: An in-order superscalar core was selected, becausewe believed that it would yield good performance gains without excessive developmenttime. We considered making a Tomasulo datapath, but the amount of labor necessary tocreate one was prohibitive. Also, it was necessary to make the Tomasulo core superscalaranyway if we wanted to reap the full benefits of out-of-order execution. Initially, westarted off with the idea to create a superscalar superpipelined processor, but it rapidlybecame clear that it was not a good idea. Superpipelining made cycle time the primaryconcern and design rather difficult. We also realized that the benefits fromsuperpipelining an already superscalar core were minimal because of the limited ILPavailable in adjacent instructions. Superpipelining involved splitting the ALU across twostages, which meant that it was not possible to forward the result from an instruction tothe one immediately following it. In a superscalar design, this can be done 50% of thetime since instructions are executed in pairs rather than as a continuously overlappingstream. Since superscalar was better at extracting parallelism and there was only alimited amount of parallelism available, there would be little ILP left that superpipeliningwould be able to extract. Since superpipelining would largely fail at extracting moreparallelism, it served only to increase latency as it executed instructions serially. As aresult, we decided to build a superscalar core with a short cycle time to reduce the latencybetween dependent adjacent instructions.Pipeline organization: We produced a standard 5-stage superscalar pipeline. Thetwo pipelines were symmetric except for the fact that one pipeline, the U pipeline, couldonly execute the even instructions while the V pipeline could only execute the oddinstructions. Forcing odd instructions to execute in one pipeline and even instructions inthe other reduced efficiency slightly, but it made pipeline implementation much easierthan if we only enforced the instructions in one pipeline to be one instruction earlier thanin the other pipeline for a corresponding stage like the Pentium. For those cases where4we only want to load one instruction into the pipeline like jumping to an odd address,correct behavior would be ensured by invalidating the instruction introduced into the U,or even, pipeline.Branching in EX: In the interests of keeping the clock rate high, we decided tomove the branching hardware out of the ID stage into the EX stage. Placing thebranching hardware in the ID stage incurs a high cost, because branch decision can onlyoccur after all the forwarding logic. Branching logic potentially required an additional 5ns which, in the context of a sub 20 ns cycle time, was a lot. Although moving branchinglater into the pipeline increases the miss penalty, it turns out that it the clock rate increasewill offset the extra penalty when loop lengths exceed two cycles even at a 50%prediction rate. With reasonable amounts of branch prediction, the pipeline will providesuperior performance quite readily. Although its made a lot of sense to take conditionalbranches in the EX stage, there was no need to take jumps like j, or jal in the EX stage aswell since we know that they were always taken. In spite of this, the pipeline would takethose branches in the EX stage to make PC changing hardware more uniform. Wereduced the penalty of doing this by feeding the data from the jumps into the branchprediction unit as well. Theoretically, there would be near perfect prediction rates, andwe wouldn’t need to make a special jump branching unit to change the PC in the IF stagefor zero cycle penalty jumps. As evidenced by the results, this was not entirely the case,but the prediction was accurate enough to reduce most of the penalty. It should be notedthat jr type jumps were taken in the EX stage, but its data was

View Full Document

Berkeley COMPSCI 152 - The La Valium Processor

Sign up for free to view:

This document and 3 million+ documents and flashcards
High quality study guides, lecture notes, practice exams
Course Packets handpicked by editors offering a comprehensive review of your courses
Better Grades Guaranteed


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3-4-5 out of 16 pages.

Berkeley COMPSCI 152 - The La Valium Processor

Sign up for free to view:

Please select your school