Moore s Law 2X transistors year CMSC 411 Computer Systems Architecture Lecture 2 Trends in Technology Cramming More Components onto Integrated Circuits on transistors cost effective integrated circuit double every N months 12 Gordon Moore Electronics 1965 CMSC 411 3 from Patterson Drill down into 4 technologies Disks Memory Network Processors Compare 1980 Archaic Nostalgic vs 2000 Modern Newfangled Performance Milestones in each technology Compare for Bandwidth vs Latency improvements in performance over time Bandwidth number of events per unit time CDC Wren I 1983 3600 RPM 0 03 GBytes capacity Tracks Inch 800 Bits Inch 9550 Three 5 25 platters Bandwidth 0 6 MBytes sec Latency 48 3 ms Cache none E g Mbits second over network Mbytes second from disk Latency elapsed time for a single event E g one way network delay in microseconds average disk access time in milliseconds CMSC 411 3 from Patterson 3 Latency Lags Bandwidth for last 20 years Performance Milestones Seagate 373453 2003 15000 RPM 4X 73 4 GBytes 2500X Tracks Inch 64000 80X Bits Inch 533 000 60X Four 2 5 platters in 3 5 form factor Bandwidth 86 MBytes sec 140X Latency 5 7 ms 8X Cache 8 MBytes CMSC 411 3 from Patterson 4 Memory Archaic Nostalgic v Modern Newfangled 1980 DRAM asynchronous 0 06 Mbits chip 64 000 xtors 35 mm2 16 bit data bus per module 16 pins chip 13 Mbytes sec Latency 225 ns no block transfer 2000 Double Data Rate Synchr clocked DRAM 256 00 Mbits chip 4000X 256 000 000 xtors 204 mm2 64 bit data bus per DIMM 66 pins chip 4X 1600 Mbytes sec 120X Latency 52 ns 4X Block transfers page mode Disk 3600 5400 7200 10000 15000 RPM 8x 143x latency simple operation w o contention BW best case CMSC 411 3 from Patterson 24 Disks Archaic Nostalgic v Modern Newfangled Tracking Technology Performance Trends N 2 5 CMSC 411 A Sussman from D O Leary CMSC 411 3 from Patterson 6 LANs Archaic Nostalgic v Modern Newfangled Latency Lags Bandwidth last 20 years Performance Milestones Memory Module 16bit plain DRAM Page Mode DRAM 32b 64b SDRAM DDR SDRAM 4x 120x Disk 3600 5400 7200 10000 15000 RPM 8x 143x Ethernet 802 3 Year of Standard 1978 10 Mbits s link speed Latency 3000 sec Shared media Coaxial cable Coaxial Cable Ethernet 802 3ae Year of Standard 2003 10 000 Mbits s 1000X link speed Latency 190 sec 15X Switched media Category 5 copper wire Plastic Covering Braided outer conductor Insulator Copper core latency simple operation w o contention BW best case CMSC 411 3 from Patterson 7 Ethernet 10Mb 100Mb 1000Mb 10000 Mb s 16x 1000x Memory Module 16bit plain DRAM Page Mode DRAM 32b 64b SDRAM DDR SDRAM 4x 120x Disk 3600 5400 7200 10000 15000 RPM 8x 143x 1982 Intel 80286 12 5 MHz 2 MIPS peak Latency 320 ns 134 000 xtors 47 mm2 16 bit data bus 68 pins Microcode interpreter separate FPU chip no caches latency simple operation w o contention BW best case 9 Performance Milestones Processor 286 386 486 Pentium Pentium Pro Pentium 4 21x 2250x Ethernet 10Mb 100Mb 1000Mb 10000 Mb s 16x 1000x Memory Module 16bit plain DRAM Page Mode DRAM 32b 64b SDRAM DDR SDRAM 4x 120x Disk 3600 5400 7200 10000 15000 RPM 8x 143x Processor Network Relative Memory BW 100 Improve ment Disk 10 Latency improvement Bandwidth improvement 1 1 10 8 2001 Intel Pentium 4 1500 MHz 120X 4500 MIPS peak 2250X Latency 15 ns 20X 42 000 000 xtors 217 mm2 64 bit data bus 423 pins 3 way superscalar Dynamic translate to RISC Superpipelined 22 stage Out of Order execution On chip 8KB Data caches 96KB Instr Trace cache 256KB L2 cache CMSC 411 3 from Patterson 10 Rule of Thumb for Latency Lagging BW Latency Lags Bandwidth last 20 years 10000 Copper 1mm thick twisted to avoid antenna effect CPUs Archaic Nostalgic v Modern Newfangled Performance Milestones CPU high Memory low Memory Wall 1000 Twisted Pair CMSC 411 3 from Patterson Latency Lags Bandwidth last 20 years CMSC 411 3 from Patterson Cat 5 is 4 twisted pairs in bundle In the time that bandwidth doubles latency improves by no more than a factor of 1 2 to 1 4 and capacity improves faster than bandwidth Stated alternatively Bandwidth improves by more than the square of the improvement in Latency 100 Relative Latency Improvement CMSC 411 3 from Patterson 11 CMSC 411 A Sussman from D O Leary CMSC 411 3 from Patterson 12 6 Reasons Latency Lags Bandwidth 6 Reasons Latency Lags Bandwidth cont d 2 Distance limits latency 1 Moore s Law helps BW more than latency Faster transistors more transistors more pins help Bandwidth MPU Transistors 0 130 vs 42 M xtors 300X DRAM Transistors 0 064 vs 256 M xtors 4000X MPU Pins 68 vs 423 pins 6X DRAM Pins 16 vs 66 pins 4X Smaller faster transistors but communicate over relatively longer wires limits latency Feature size 1 5 to 3 vs 0 18 micron 8X 17X MPU Die Size 35 vs 204 mm2 ratio sqrt 2X DRAM Die Size 47 vs 217 mm2 ratio sqrt 2X CMSC 411 3 from Patterson E g 10 Gbits s Ethernet 10 Gig vs 10 sec latency Ethernet 4400 MB s DIMM PC4400 vs 50 ns latency Even if just marketing customers now trained Since bandwidth sells more resources thrown at bandwidth which further tips the balance CMSC 411 3 from Patterson 14 6 Reasons Latency Lags Bandwidth cont d Spinning disk faster improves both bandwidth and rotational latency 3600 RPM 15000 RPM 4 2X Average rotational latency 8 3 ms 2 0 ms Things being equal also helps BW by 4 2X Lower DRAM latency More access second higher bandwidth Higher linear density helps disk BW and capacity but not disk Latency 9 550 BPI 533 000 BPI 60X in BW CMSC 411 3 from Patterson 3 Bandwidth easier to sell bigger better 5 Bandwidth hurts latency 4 Latency helps BW but not vice versa Size of DRAM block long bit and word lines most of DRAM access time Speed of light and computers on network 13 6 Reasons Latency Lags Bandwidth cont d Queues help Bandwidth hurt Latency Queuing Theory Adding chips to widen a memory module increases Bandwidth but higher fan out on address lines may increase Latency 6 Operating System overhead hurts Latency more than Bandwidth 15 Long messages amortize overhead overhead bigger part of short messages CMSC 411 3 from Patterson 16 Summary of Technology Trends For disk LAN memory and microprocessor bandwidth improves by square of latency improvement In the time that bandwidth doubles latency improves by no more than 1 2X to 1 4X TRENDS IN SILICON COSTS Lag probably even larger in real systems as bandwidth gains multiplied by replicated components Multiple processors in a cluster or even in a chip Multiple disks in a disk array Multiple memory modules in a
View Full Document