Unformatted text preview:

High Performance Computing for Engineering ApplicationsVirginia W. Ross, Ph. D.Air Force Research Laboratory/Information [email protected] of Wisconsin – MadisonGuest Lecture – April 28, 2011DISTRIBUTION STATEMENT A. Approved for public release; distribution unlimited.Case numbers 88ABW-2011-1884 & 88ABW-2010-49762Outline• 500 TeraFLOPS Heterogeneous HPC Cluster– HPC Introduction– Hardware– Applications• Cloud Computing for DoD– Cloud Computing Background– Federal Government Cloud Computing– Organizational Benefits Gained from Cloud Computing• Conclusions3500 TeraFLOPSHeterogeneous Cluster4Outline• 500 TeraFLOPS Heterogeneous HPC Cluster– HPC Introduction– Hardware– Applications• Cloud Computing for DoD– Cloud Computing Background– Federal Government Cloud Computing– Organizational Benefits Gained from Cloud Computing• Conclusions5HPC Introduction• This system put AFRL/RI in the lead for hosting the largest interactive High Performance Computer (HPC) for the Department of Defense. • The Cell BE cluster was transitioned to another facility. • These machines are freely available to government researchers and their contractors.6What makes this advance possible?• As the server market drove price-performance improvements that the HPC community leveraged over the past decade, now the gaming marketplace may deliver 10x-20x improvements (power as well).– $3800 3.2 GHz dual-quad core Xeon®, 96 Gflops(DP)- baseline system, Power 1000 Watts– $380 3.2 GHz PS3® with Cell Broadband Engine® 153 Gflops (SP), power 135 Watts• 1.6X Flops/board, 1/10thcost – $2000 NVIDIA Tesla C2050 (515Gflops (DP), 1.03Tflops (SP)), Power 225 Watts• 1/10th cost, 1/20ththe power7Outline• 500 TeraFLOPS Heterogeneous HPC Cluster– High Performance Computing (HPC) Introduction– Hardware– Applications• Cloud Computing for DoD– Cloud Computing Background– Federal Government Cloud Computing– Organizational Benefits Gained from Cloud Computing• Conclusions8$380Cell BE ® processor256 MB RDRAM (only) 160 GB hard drive Gigabit Ethernet (only)153 Gflops Single Precision Peak380 TFLOPS/$MSony HypervisorFedora Core 7 or 9 Linux or YDL 6.2 IBM CELL SDK 3.1PlayStation3 Fundamentals6 of 8 SPEs available25.6 GB/sec to RDRAM~110 Watts9NVIDIA C2050 1.1TFLOPS SP515GFLOPS DPAFRL/RIT Horus Cluster10 - 1U Rack Servers • 26 Tflops• Supports TTCP efforts• 18 General Purpose Graphical Processor Units (GPGPUs) Cluster10Key Questions • Which codes could scale given these constraints? • Can a hybrid mixture of PS3s and traditional servers mitigate the weaknesses of the PS3s alone and still deliver outstanding price-performance? • What level of effort is required to deliver a reasonable percentage of the enormous peak throughput?• A case study approach is being taken to explore these questions11Early Access System Approach• A 53 TeraFLOPS cluster of PlayStation® 3s was built at AFRL Information Directorate in Rome, NY to provide early access to the IBM Cell Broadband Engine® chip technology included in the low priced commodity gaming consoles.• A heterogeneous cluster with powerful subclusterheadnodes is used to balance the architecture in light of PS3 memory and input/output constraints– 14 subclusters each with 24 PS3s and a headnode• Interactive usage• Used by HPCMP community for experimentation12Cell Cluster Architecture• The Cell Cluster has a peak performance of 51.5 Teraflops from 336 PS3s and additional 1.4 TF from the headnodes on its 14 subclusters.• Cost: $361K ($257K from HPCMP)•PS3s 37% of cost• Price Performance: 147 TFLOPS/$M• The 24 PS3s in aggregate contain 6 GB of memory and 960 GB of disk. The dual quad-core Xeon headnodes have 32 GB of DRAM and 4 TB of disk each.13500 TFLOPS Architecture (2010)• ~300 Tflops from 2000 PS3s• ~200 Tflops from GPGPUs on subcluster headnodes• Cost: ~$2M14500 TFLOPS Architecture• Approx. 270 TFLOPS from 1,760 PS3s• 153 GFLOPS/PS3• 80 subclusters of 22 PS3s• Approx. 230 TFLOPS from subclusterheadnodes• 2 GPGPU (2.1 TFLOPS / headnode)• 84 headnodes (Intel Nehalem 5660 dual socket Hexa (12 cores))• *Horus Cluster (~26 Tflops)• Cost: Approx. $2M• Total Power 300KWCONDOR CLUSTER Online: December 2010PS310GbE/1GbE switch2U Compute Node 2.5 TF/sLegend15Condor Compute Node (2U)CONDOR Node (Dual Nahlem x5650, 24 GB Ram, 2TB HD, 1200W PS,2 Tesla GPGPUs , 40Gb/s Inf, Dual 10Gb (2.5 Tflops SP or 1.2 Tflops DP)PS310GbE/1GbE switch2U Compute Node 2.5 TF/sLegend16EmulabHPC Assets on RRS NetworkCoyote Cluster HorusClusterCONDOR CLUSTERARC HPC Facility Layout75’22’17Cell Cluster: Early Access to Commodity Multicore Solving the hard problems . . .10 March 200917This project provides the HPCMP community with early access to HPC scale commodity multicore through a 336 node cluster of PS3 gaming consoles (53 TF).Applications leveraging the >10X price-performance advantage include: large scale simulations of neuromorphic computing modelsGOTCHA radar video SAR for wide area persistent surveillanceReal-time PCID image enhancement for space situational awarenessDr. Richard Linderman, AFRL/RI, Rome, NY… but beginning to perceive that the handcuffs were not for me and that the military had so far got …Neuromorphic example:Robust recognition of occluded textGotcha SARPCID Image Enhancement18Outline• 500 TeraFLOPS Heterogeneous HPC Cluster– HPC Introduction– Hardware– Applications• Cloud Computing for DoD– Cloud Computing Background– Federal Government Cloud Computing– Organizational Benefits Gained from Cloud Computing• Conclusions19Neuromorphic Computing Architecture Simulation Case• The driving application behind developing a 53 TF class cluster was to support basic research into alternative neuromorphic computing architectures.• The first of these to be optimized for the PS3 was the “Brain-State-In –A-Box” (BSB)—looking for 1M BSBs simulating in real time• Optimized the BSB for the PS3 and achieved 18 GFLOPS on each core of the PS3 [6]. Across the 6 cores, 108 GFLOPS/PS3, over 70% of peak was sustained.– 12 staff week effort for first PS3 optimization experience• Constructing hybrid simulations with BSBs and “Confabulation” models20Minicolumn ModelHybrid: Attractor + Geometric ReceptorsLiterature reviews: minicolumn anatomy, cortical anatomy, cortical


View Full Document

UW-Madison ME 964 - Lecture 0428

Documents in this Course
Load more
Download Lecture 0428
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture 0428 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 0428 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?