http://cag.csail.mit.edu/ps31 Student Presentation 6.189 IAP 2007 MITBlue-Steel Ray TracerNatalia ChernenkoMichael D'AmbrosioScott FisherRussel RyanBrian SweattLeevar WilliamsMIT 6.189 IAP 2007Student ProjectGame Developers ConferenceMarch 7 20072http://cag.csail.mit.edu/ps3Student Presentation 6.189 IAP 2007 MITImperative Need for Parallel Programming Education“To put it quite bluntly: as long as there were no machines, programming was no problem at all; when we had a few weak computers, programming became a mild problem, and now we have gigantic computers, programming has become an equally gigantic problem."-- E. Dijkstra, 1972 Turing Award LectureThe “Software Crisis”3http://cag.csail.mit.edu/ps3Student Presentation 6.189 IAP 2007 MIT1985 199019801970 1975 1995 20002005RawPower4OpteronPower6NiagaraYonahPExtremeTanglewoodCellIntelTflopsXbox360CaviumOcteonRazaXLRPA-8800CiscoCSR-1PicochipPC102Boardcom 148020??# ofcores1248163264128256512Opteron 4PXeon MPAmbricAM2045Multicores are Here4004800880868080 286 386 486 Pentium P2 P3P4ItaniumItanium 2Athlon4http://cag.csail.mit.edu/ps3Student Presentation 6.189 IAP 2007 MITTeaching Parallel Programming ● Prof. Saman Amarasinghe (MIT) and Dr. Rodric Rabbah (IBM) Month long intensive course http://cag.csail.mit.edu/ps3 for lectures, recitations, and labs Sponsored by Sony, Toshiba and IBM Technical support from Sony, IBM, Terra Soft● Course outcomes Know fundamental concepts of parallel programming (both hardware and software) Understand issues of parallel performance Able to synthesize a fairly complex parallel program Hands-on experience with the Cell processor– Sony PS3 consoles running YDL (Yellow Dog Linux)– IBM Cell SDK from developerWorks5http://cag.csail.mit.edu/ps3Student Presentation 6.189 IAP 2007 MITLearning From Student Perspective Fun and challenging context attracted many students Using PS3s as the platform for student projects Programming the new Cell processor"PS3 attracted me but hearing about the future of parallel programming kept me around." – student quote6http://cag.csail.mit.edu/ps3Student Presentation 6.189 IAP 2007 MITClass Project Competition● 7 ambitious projects Ray Tracer Global Illumination Linear Algebra Pack Molecular Dynamics Simulator Speech Synthesizer Soft Radio Backgammon Tutor● Presentation, including performance results available online http://cag.csail.mit.edu/ps3/competition.shtml Some source code will also be published7http://cag.csail.mit.edu/ps3Student Presentation 6.189 IAP 2007 MITOur Project: Ray-TracerBlue-Steel8http://cag.csail.mit.edu/ps3Student Presentation 6.189 IAP 2007 MITThe Idea: Realistic GraphicsA Solution to the rendering equation Triangle Rasterization– Fast – possible in real time on a single core– Inaccurate or tedious for global effects such as shadows, reflection, refraction, or global illumination– “Start with speed, try to get realism” Ray Tracing– Slow – unless done on multiple cores– Accurate and natural shadows, reflection, and refraction– “Start with realism, try to get speed”9http://cag.csail.mit.edu/ps3Student Presentation 6.189 IAP 2007 MITThe Idea: Realistic Graphics● Real time rasterization is done all the time! Instead, build a fast ray tracer from the ground up to take advantage of multiple cores. PS3 is perfect– 6 accessible cores for rendering– Fast XDR ram for transferring scene data / frames – Practically a GPU on its own – no need for additional hardwarePPESPESPESPESPESPESPEGPUFrame BufferWithout GPU, using Blue-SteelCPUGPUModern graphics w/ GPUFrame Buffer10http://cag.csail.mit.edu/ps3Student Presentation 6.189 IAP 2007 MITRay Tracing● Shoot a ray through each pixel on the screen● Check for intersections with each object in the scene● Keep the closest intersection11http://cag.csail.mit.edu/ps3Student Presentation 6.189 IAP 2007 MITRay Tracing● Shade each point according to the material of the object, as well as the lights in the scene Stopping at this level achieves traditional scan-line rasterization quality12http://cag.csail.mit.edu/ps3Student Presentation 6.189 IAP 2007 MITRay Tracing● Cast rays for shadows, reflection, and refraction Recursive rays are processed identically to primary rays Framework for global effects is built into ray tracing by design13http://cag.csail.mit.edu/ps3Student Presentation 6.189 IAP 2007 MITRay Tracing on the PS3● Design Challenges Bandwidth & latency of PPE / SPE communication– Mailboxes can only hold 128 bits at a time Limited size of local store– 256 KB for program, execution stack, scene, and frame data DMA latency– Two orders of magnitude slower than local store14http://cag.csail.mit.edu/ps3Student Presentation 6.189 IAP 2007 MITRay Tracing on the PS3● Design Challenges Inherent SIMD architecture of SPE– Scalar code – like most code today – is expensive No Branch Prediction– 'if' statements and loops are costly Load-Balancing– Splitting up computation so as to minimize communication / computation overhead15http://cag.csail.mit.edu/ps3Student Presentation 6.189 IAP 2007 MITRay Tracing on the PS3● High level design Clump a set of SPEs together as one rendering engine– Each SPE holds a full set of scene data– Each SPE renders only part of the scene– Run a full ray tracer on every SPE– Engine has a set of instructions just like any processor• Instructions are sent to this engine using SPE mailboxes SPE-centric framework– Each SPE has knowledge of what work it must do, PPE tells it what to render only at the start of the process16http://cag.csail.mit.edu/ps3Student Presentation 6.189 IAP 2007 MITRay Tracing on the PS3● Tackling the Challenges Bandwidth & latency of PPE / SPE communication– SPE-centric framework• No need for communication during the rendering process Limited size of local store– Pack data efficiently in vectors– Split scene into chunks that can be stored one at a time DMA latency– Hide latency through double-buffering– Work on one type of object while transferring another17http://cag.csail.mit.edu/ps3Student Presentation 6.189 IAP 2007 MITRay Tracing on the PS3● Tackling the Challenges No branch prediction– Only 3 explicit 'if' statements in code– Have compiler unroll loops Inherent SIMD architecture of SPE– View everything as packets, work on 4 at a time Load
View Full Document