BlueGene L Supercomputer George Chiu IBM Research 01 14 19 1 Supercomputer Peak Performance 1E 17 multi Petaflop Peak Speed flops Petaflop 1E 14 1E 11 Doubling time 1E 8 1E 5 1E 2 1940 Blue Gene L Red Storm Earth Blue Pacific ASCI White ASCI Q SX 5 ASCI Red Option ASCI Red T3E SX 4 NWT CP PACS CM 5 Paragon T3D Delta SX 3 44 i860 MPPs 1 5 yr VP2600 10 CRAY 2 SX 2 S 810 20 X MP4 Y MP8 Cyber 205 X MP2 parallel vectors CRAY 1 CDC STAR 100 vectors CDC 7600 ILLIAC IV CDC 6600 ICs IBM Stretch IBM 7090 transistors IBM 704 IBM 701 UNIVAC ENIAC vacuum tubes 1950 1960 1970 1980 1990 2000 2010 Year Introduced 01 14 19 2 BlueGene L S y s te m 6 4 c a b in e ts 6 4 x 3 2 x 3 2 C a b in e t 3 2 N o d e b o a rd s 8 x 8 x 1 6 N o d e B o a rd 3 2 c h ip s 4 x 4 x 2 1 6 C o m p u te C a rd s C o m p u te C a rd 2 c h ip s 2 x 1 x 1 1 8 0 3 6 0 T F s 16 TB D D R C h ip 2 p ro c e s s o rs 9 0 1 8 0 G F s 8 G B D D R 2 8 5 6 G F s 4 M B 01 14 19 2 9 5 7 T F s 256 G B D D R 5 6 1 1 2 G F s 0 5 G B D D R 3 512 Way BG L Prototype 01 14 19 4 BlueGene L Interconnection Networks 3 Dimensional Torus Interconnects all compute nodes 65 536 Virtual cut through hardware routing 1 4Gb s on all 12 node links 2 1 GB s per node Communications backbone for computations 0 7 1 4 Tb s bisection bandwidth 67TB s total bandwidth Global Tree One to all broadcast functionality Reduction operations functionality 2 8 Gb s of bandwidth per link Latency of tree traversal 2 5 s 23TB s total binary tree bandwidth 64k machine Interconnects all compute and I O nodes 1024 Ethernet Incorporated into every node ASIC Active in the I O nodes 1 64 All external comm file I O control user interaction etc 01 14 19 5 BG L compute nodes 65 536 BG L I O nodes 1 024 1024 Federated Gigabit Ethernet Switch 2 048 ports Complete BlueGene L System at LLNL WAN 64 visualization 128 archive 512 8 Control network 01 14 19 48 8 8 CWFS Front end nodes Service node 6 Summary of performance results DGEMM LINPACK Tuned Copy 2 4 GB s Scale 2 1 GB s Add 1 8 GB s Triad 1 9 GB s Standard Copy 1 2 GB s Scale 1 1 GB s Add 1 2 GB s Triad 1 2 GB s At 700 MHz Would beat STREAM numbers for most high end microprocessors MPI 01 14 19 Up to 508 MFlops on single processor at 444 MHz TU Vienna Pseudo ops performance 5N log N 700 MHz of 1300 Mflops 65 of peak STREAM impressive results even at 444 MHz Single processor performance roughly on par with POWER3 at 375 MHz Tested on up to 128 nodes also NAS Parallel Benchmarks FFT 77 of peak on 1 node 70 of peak on 512 nodes 1435 GFlops at 500 MHz sPPM UMT2000 92 3 of dual core peak on 1 node Observed performance at 500 MHz 3 7 GFlops Projected performance at 700 MHz 5 2 GFlops tested in lab up to 650 MHz Latency 4000 cycles 5 5 s at 700 MHz Bandwidth full link bandwidth demonstrated on up to 6 links 7 Applications BG L is a general purpose technical supercomputer N body simulation molecular dynamics classical and quantum plasma physics stellar dynamics for star clusters galaxies Complex multiphysics code Computational Fluid Dynamics weather climate sPPM Accretion Raleigh Jeans instability planetary formation and evolution radiative transport Magnetohydrodynamics Modeling thermonuclear events in on astrophysical objects neutron stars white dwarfs supernovae Radiotelescope FFT 01 14 19 8 Summary Embedded technology promises to be an efficient path toward building massively parallel computers optimized at the system level Cost performance is 20x better than standard methods to get to TFlops Low Power is critical to achieving a dense simple inexpensive packaging solution Blue Gene L will have a scientific reach far beyond existing limits for a large class of important scientific problems Blue Gene L will give insight into possible future product directions 01 14 19 Blue Gene L hardware will be quite flexible A mature sophisticated software environment needs to be developed to really determine the reach both scientific and commercial of this architecture 9
View Full Document
Unlocking...