Administrivia Homework problems for Unit 1 posted due next Thursday 2 12 Read R dA Appendix di A Basic B i Pi Pipelining li i Appendix B useful as a MIPS ISA reference CMSC 411 Computer Systems y Architecture Lecture 3 Reliability and Performance worth reading but we ll only touch on parts of it in lecture as needed Alan Sussman a s cs u d edu als cs umd edu CMSC 411 3 from Patterson Outline 2 Moore s Law 2X transistors year Moore s Law and Technology Trends Reliability and MTTF P f Performance MIPS An ISA for Pipelining 5 stage pipelining Structural and Data Hazards Forwarding Branch Schemes Exceptions and Interrupts C Conclusion l i Cramming More Components onto Integrated Circuits on transistors cost effective integrated circuit double every N months 12 N 24 Gordon Moore Electronics 1965 CMSC 411 3 from Patterson 3 CMSC 411 3 from Patterson 4 Disks Archaic Nostalgic v Modern Newfangled Tracking Technology Performance Trends Drill down into 4 technologies Disks Memory Memory Network Processors Compare 1980 Archaic Nostalgic vs vs 2000 Modern Newfangled Performance Milestones in each technology Compare for Bandwidth vs vs Latency improvements in performance over time Bandwidth number of events per unit time Bandwidth 0 6 MBytes sec Latency 48 3 ms Cache none E g E Mbit second Mbits d over network t k Mbytes Mb t second d from f disk di k Latency elapsed time for a single event E g one way network delay in microseconds average disk access time in milliseconds CMSC 411 3 from Patterson Performance Milestones Relative B BW Improve ement Disk 100 10 1 10 2000 Double Data Rate Synchr clocked DRAM 256 00 256 00 Mbit Mbits chip hi 4000X 256 000 000 xtors 204 mm2 64 bit data bus per DIMM 66 pins chip 4X 1600 Mbytes sec 120X Latency 52 ns 4X Block transfers page mode Disk 3600 5400 7200 10000 15000 RPM 8x 143x Latency improvement Bandwidth improvement 1 6 Memory Archaic Nostalgic v Modern Newfangled 1980 DRAM asynchronous 0 06 0 06 Mbi Mbits chip hi 64 000 xtors 35 mm2 16 bit 16 bit data bus per module 16 pins chip 13 Mbytes sec Latency 225 ns no block transfer 1000 Seagate 373453 2003 15000 RPM 4X 73 4 GBytes 2500X Tracks Inch 64000 80X Bits Inch 533 000 60X Four 2 5 platters in 3 5 3 5 form factor Bandwidth 86 MBytes sec 140X Latency 5 7 ms 8X Cache 8 MBytes CMSC 411 3 from Patterson 5 Latency Lags Bandwidth for last 20 years 10000 CDC Wren I 1983 3600 RPM 0 03 GBytes capacity Tracks Inch 800 Bits Inch 9550 Three 5 25 platters 100 Relative Latency Improvement latency simple operation w o contention BW best case CMSC 411 3 from Patterson 7 CMSC 411 3 from Patterson 8 LANs Archaic Nostalgic v Modern Newfangled Latency Lags Bandwidth last 20 years 10000 Performance Milestones Ethernet 802 3 Year of Standard 1978 10 Mbits s link speed Latency 3000 sec sec Shared media Coaxial cable Re elative BW Improvement 1000 Memory Disk 100 Memory Module 16bit plain DRAM Page Mode DRAM DRAM DRAM 32b 32b 64b SDRAM DDR SDRAM 4x 120x Disk Di k 3600 3600 5400 5400 7200 7200 10000 10000 15000 RPM 8x 143x 10 Latency improvement Bandwidth improvement 1 1 10 100 Coaxial Cable Ethernet 802 3ae Year of Standard 2003 10 000 Mbits s 1000X link speed Latency 190 sec sec 15X Switched media Category 5 copper wire Cat 5 is 4 twisted pairs in bundle Plastic Covering Braided outer conductor Insulator Copper core latency y simple p operation p w o contention BW best case Twisted Pair Copper 1mm thick twisted to avoid antenna effect Relative Latency Improvement CMSC 411 3 from Patterson CMSC 411 3 from Patterson 9 Latency Lags Bandwidth last 20 years 10000 CPUs Archaic Nostalgic v Modern Newfangled Performance Milestones Relative BW W Improve ement 1000 Network 100 M Memory Disk 10 1 1 0 1 10 Latency improvement Bandwidth improvement Ethernet 10Mb 100Mb 1000Mb 10000 Mb s 16x 1000x Memory Module 16bit plain DRAM Page Mode DRAM DRAM DRAM 32b 32b 64b SDRAM DDR SDRAM 4x 120x 100 Disk 3600 5400 7200 10000 15000 RPM 8x 143x Relative Latency Improvement latency simple operation w o contention BW best case CMSC 411 3 from Patterson 10 11 1982 Intel 80286 12 5 MHz 2 MIPS peak Latency 320 ns 134 000 xtors 134 000 t 47 mm2 16 bit data bus 68 pins Microcode interpreter interpreter separate FPU chip no caches 2001 Intel Pentium 4 1500 MHz 120X 4500 MIPS peak 2250X Latency 15 ns 20X 42 000 000 xtors 42 000 000 t 217 mm2 64 bit data bus 423 pins 3 way superscalar superscalar Dynamic translate to RISC Superpipelined 22 stage Out of Order execution On chip 8KB Data caches 96KB Instr Trace cache 256KB L2 cache h CMSC 411 3 from Patterson 12 Rule of Thumb for Latency Lagging BW Latency Lags Bandwidth last 20 years Performance Milestones Processor 286 386 486 P i Pentium Pentium P i Pro P Pentium 4 21x 2250x Ethernet 10Mb 100Mb 1000Mb 10000 Mb s 16x 1000x Memory Module 16bit plain DRAM Page Mode DRAM DRAM DRAM 32b 32b 64b SDRAM DDR SDRAM 4x 120x Disk 3600 5400 7200 10000 15000 RPM 8x 143x 10000 CPU high Memory low Memory Wall 1000 Processor Network Relative Memory BW 100 Improve ment Disk 10 Latency improvement Bandwidth improvement 1 1 10 In the time that bandwidth doubles latency improves by no more than a factor of 1 2 to 1 4 and capacity improves faster than bandwidth Stated alternatively Bandwidth improves by more than the square of the improvement in Latency 100 Relative Latency Improvement CMSC 411 3 from Patterson 6 Reasons Latency Lags Bandwidth 2 Distance limits latency Faster transistors more transistors more pins help Bandwidth MPU Transistors 0 130 vs 42 M xtors 300X DRAM Transistors 0 064 vs 256 M xtors 4000X MPU Pins 68 vs 423 pins 6X DRAM Pins 16 vs 66 pins 4X Smaller faster transistors but communicate Smaller over relatively longer wires limits latency Feature size 1 5 to 3 vs 0 18 micron 8X 17X MPU Di Die Si Size 35 vs 204 mm2 ti sqrtt 2X ratio DRAM Die Size 47 vs 217 mm2 ratio sqrt 2X CMSC 411 3 from Patterson 14 6 Reasons Latency Lags Bandwidth cont d 1 Moore s Law helps BW more than latency CMSC 411 3 from Patterson 13 15 Size of DRAM block long bit and word lines most of DRAM access time Speed of light and computers on network 3 Bandwidth easier to sell bigger better gg E g 10 Gbits s Ethernet 10 Gig vs 10 sec latency Ethernet 4400 MB s DIMM PC4400 PC4400 vs 50 ns latency Even if just marketing customers now trained Since bandwidth sells more resources thrown at bandwidth which further tips the balance CMSC 411 3 from Patterson 16 6 Reasons Latency Lags Bandwidth cont d 5 Bandwidth hurts latency 4 Latency helps BW but
View Full Document