4/1/11$1$CS$61C:$Great$Ideas$in$Computer$Architecture$(Machine$Structures)$Instruc(on*Level*Parallelism—*The*Datapath*Instructors:$Randy$H.$Katz$David$A.$PaFerson$hFp://inst.eecs.Berkeley.edu/~cs61c/fa10$4/1/11$ 1$Spring$2011$OO$Lecture$#20$You$Are$Here!$• Parallel$Requests$Assigned$to$computer$e.g.,$Search$“Katz”$• Parallel$Threads$Assigned$to$core$e.g.,$Lookup,$Ads$• Parallel$InstrucYons$>1$instrucYon$@$one$Yme$e.g.,$5$pipelined$instrucYons$• Parallel$Data$>1$data$item$@$one$Yme$e.g.,$Add$of$4$pairs$of$words$• Hardware$descripYons$All$gates$funcYoning$in$parallel$at$same$Yme$4/1/11$ Spring$2011$OO$Lecture$#20$ 3$Smart$Phone$Warehouse$Scale$Computer$So9ware********Hardware*Harness*Parallelism*&*Achieve*High*Performance*Logic$Gates$Core$ Core$…$$$$$$Memory$$$$$$$$$$$$$$$(Cache)$Input/Output$Computer$Main$Memory$Core$$$$$$$$$$InstrucYon$Unit(s)$$$$$$$$FuncYonal$Unit(s)$A3+B3$A2+B2$A1+B1$A0+B0$Today’s $Lecture$Agenda$• Pipelined$ExecuYon$• Administrivia$• Pipelined$Datapath$• Pipeline$Hazards$• Technology$Break$• Pipelining$and$InstrucYon$Set$Design$• Summary$4/1/11$ 4$Spring$2011$OO$Lecture$#20$Agenda$• Pipelined$ExecuYon$• Administrivia$• Pipelined$Datapath$• Pipeline$Hazards$• Technology$Break$• Pipelining$and$InstrucYon$Set$Design$• Summary$4/1/11$ 5$Spring$2011$OO$Lecture$#20$Review:$RISC$Design$Principles$• “A$simpler$core$is$a$faster$core”$• ReducYon$in$the$number$and$complexity$of$instrucYons$in$the$ISA$$simplifies$pipelined$implementaYon$• Common$RISC$strategies:$– Fixed*instrucYon$length,$generally$a$single$word;$$Simplifies$process$of$fetching$instrucYons$from$memory$– Simplified*addressing$modes;$Simplifies$process$of$fetching$operands$from$memory$– Fewer*and$simpler*instrucYons$in$the$instrucYon$set;$Simplifies$process$of$execuYng$instrucYons$– Simplified*memory*access:*only$load$and$store$instrucYons$access$memory;$– Let*the*compiler*do*it.*Use$a$good$compiler$to$break$complex$highOlevel$language$statements$into$a$number$of$simple$assembly$language$statements$4/1/11$ Spring$2011$OO$Lecture$#5$ 6$Review:$SingleOCycle$Processor$• Five$steps$to$design$a$processor:$1.$Analyze$instrucYon$set$$$datapath$requirements$2.$Select$set$of$datapath$$components$&$establish$$clock$methodology$3.$Assemble$datapath$meeYng$$the$requirements:$reOexamine$for$pipelining$4.$Analyze$implementaYon$of$each$instrucYon$to$determine$selng$of$control$points$that$effects$the$register$transfer.$5.$Assemble$the$control$logic$• Formulate$Logic$EquaYons$• Design$Circuits$4/1/11$ Spring$2011$OO$Lecture$#20$ 7$Control'Datapath'Memory'Processor'Input'Output'4/1/11$2$4/1/11$ Spring$2011$OO$Lecture$#20$Single$Cycle$Performance$• Assume$Yme$for$acYons$are$– 100ps$for$register$read$or$write;$200ps$for$other$events$• Clock$rate$is?$Instr Instr fetch Register read ALU op Memory access Register write Total time lw 200ps 100 ps 200ps 200ps 100 ps 800ps sw 200ps 100 ps 200ps 200ps 700ps R-format 200ps 100 ps 200ps 100 ps 600ps beq 200ps 100 ps 200ps 500ps 8$• $What$can$we$do$to$improve$clock$rate?$• $Will$this$improve$performance$as$well?$Want$increased$clock$rate$to$mean$faster$programs$Student$RouleFe?$Pipeline$Analogy:$Doing$Laundry$• Ann,$Brian,$Cathy,$Dave$$each$have$one$load$of$clothes$to$wash,$dry,$fold,$and$put$away$– Washer$takes$30$minutes$– Dryer$takes$30$minutes$– “Folder”$takes$30$minutes$– “Stasher”$takes$30$minutes$to$put$clothes$into$drawers$A B C D 4/1/11$ 9$Spring$2011$OO$Lecture$#20$SequenYal$Laundry$• SequenYal$laundry$takes$$8$hours$for$4$loads$T a s k O r d e r B C D A 30 Time 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 6 PM 7 8 9 10 11 12 1 2 AM 4/1/11$ 10$Spring$2011$OO$Lecture$#20$Pipelined$Laundry$• Pipelined*laundry*takes**3.5*hours*for*4*loads!**T a s k O r d e r B C D A 12 2 AM 6 PM 7 8 9 10 11 1 Time 30 30 30 30 30 30 30 4/1/11$ 11$Spring$2011$OO$Lecture$#20$• Pipelining$doesn’t$help$latency$of$single$task,$it$helps$throughput$of$enYre$workload$• MulYple$tasks$operaYng$simultaneously$using$different$resources$• PotenYal$speedup$=$Number$pipe$stages$• Time$to$fill$pipeline$and$Yme$to$drain$it$reduces$speedup:$2.3X$v.$4X$in$this$example$6 PM 7 8 9 Time B C D A 30 30 30 30 30 30 30 T a s k O r d e r Pipelining$Lessons$(1/2)$4/1/11$ 12$Spring$2011$OO$Lecture$#20$• Suppose$new$Washer$takes$20$minutes,$new$Stasher$takes$20$minutes.$How$much$faster$is$pipeline?$• Pipeline$rate$limited$by$slowest$pipeline$stage$• Unbalanced$lengths$of$pipe$stages$reduces$speedup$6 PM 7 8 9 Time B C D A 30 30 30 30 30 30 30 T a s k O r d e r Pipelining$Lessons$(2/2)$4/1/11$ 13$Spring$2011$OO$Lecture$#20$4/1/11$3$Agenda$• Pipelined$ExecuYon$• Administrivia$• Pipelined$Datapath$• Pipeline$Hazards$• Technology$Break$• Pipelining$and$InstrucYon$Set$Design$• Summary$4/1/11$ 14$Spring$2011$OO$Lecture$#20$Administrivia$• Project$4:$Pipelined$Cycle$Processor$in$Logicsim$– Due$Part$1,$datapath,$due$4/10,$Part$2$due$4/17$– FaceOtoOFace$grading:$Signup$for$Ymeslot$last$week$• Extra$Credit:$Fastest$Version$of$Project$3$– Due$4/24$23:59:59$• Final$Review:$$TBD$• Final:$Mon$May$9$11AMO2PM$(TBD)$4/1/11$ Spring$2011$OO$Lecture$#20$ 15$Agenda$• Pipelined$ExecuYon$• Administrivia$• Pipelined$Datapath$• Pipeline$Hazards$• Technology$Break$• Pipelining$and$InstrucYon$Set$Design$• Summary$4/1/11$ 17$Spring$2011$OO$Lecture$#20$Review:$Single$Cycle$Datapath$op' rs' rt' immediate'0$16$21$26$31$• Data$Memory${R[rs]$+$SignExt[imm16]}$$=$$R[rt]$32$ALUctr=$clk$busW$RegWr=$32$32$busA$32$busB$5$ 5$Rw$ Ra$ Rb$RegFile'Rs$Rt$Rt$Rd$RegDst=$Extender'32$16$imm16$ALUSrc=$ExtOp=$MemtoReg=$clk$Data$In$32$MemWr=$zero$0$1$0$1$='ALU'0$1$WrEn$ Adr$Data'Memory'5$InstrucYon<31:0>$<21:25>$<16:20>$<11:15>$<0:15>$Imm16$Rd$Rt$Rs$nPC_sel=$instr'fetch'unit'clk$4/1/11$ 18$Spring$2011$OO$Lecture$#20$1)$IF:$InstrucYon$Fetch,$Increment$PC$2)$ID:$InstrucYon$Decode,$Read$Registers$3)$EX:$$$MemOref:$Calculate$Address$$$ArithOlog:$Perform$OperaYon$4)$Mem:$$$$Load:$Read$Data$from$Memory$$$Store:$Write$Data$to$Memory$5)$WB:$Write$Data$Back$to$Register$Steps$in$ExecuYng$MIPS$4/1/11$
View Full Document