Lecture 14: Multicore Strategies for GamesProf. Aaron LantermanSchool of Electrical and Computer EngineeringGeorgia Institute of Technology2Bad multithreadingThread 1Thread 2Thread 3Thread 4Thread 5Slide from Bruce Dawson & Chuck Walbourn, Microsoft GameTechnology Group, “Coding for Multiple Cores,” PowerPoint presentation3Rendering ThreadRendering ThreadRendering ThreadGame ThreadGood multithreadingMain ThreadPhysicsRendering ThreadAnimation/SkinningParticle SystemsNetworkingFile I/OGame ThreadSlide from Bruce Dawson & Chuck Walbourn, Microsoft GameTechnology Group, “Coding for Multiple Cores,” PowerPoint presentation4Another paradigm: cascadesThread 2: PhysicsThread 4: RenderingThread 5: PresentThread 1: InputThread 3: AI• Advantages:– Synchronization points are few and well-defined• Disadvantages:– Increases latency (for constant frame rate)– Needs simple (one-way) data flow– For balance, each chunk needs to take a similar amount of timeSlide from Bruce Dawson & Chuck Walbourn, Microsoft GameTechnology Group, “Coding for Multiple Cores,” PowerPoint presentation5Typical task: File decompression• Most common CPU heavy thread on theXbox 360• Easy to multithread• Allows use of aggressive compression toimprove load times• Don’t throw a thread at a problem bettersolved by offline processing– Texture compression, file packing, etc.Slide from Bruce Dawson & Chuck Walbourn, Microsoft GameTechnology Group, “Coding for Multiple Cores,” PowerPoint presentation6Typical task: Rendering• Separate update and render threads• Rendering on multiple threads usuallyworks poorly– GPU can have trouble if multiple threadstry to talk to it at once (Xbox 360command buffers are OK)• Special case of cascades paradigm– Pass render state from update to renderSlideadapted from Bruce Dawson & Chuck Walbourn, Microsoft GameTechnology Group, “Coding for Multiple Cores,” PowerPoint presentation7Separate rendering threadUpdate ThreadBuffer 1Render ThreadBuffer 0Slide from Bruce Dawson & Chuck Walbourn, Microsoft GameTechnology Group, “Coding for Multiple Cores,” PowerPoint presentation8Typical task: Graphics fluff• Extra graphics that doesn’t affect play– Procedurally generated animating cloud textures– Cloth simulations– Dynamic ambient occlusion– Procedurally generated vegetation, etc.– Extra particles, better particle physics, etc.• Easy to synchronize• One game had one thread manipulating cloth, theanother thread handling cloth shadows• On single-core machines, can drop or simplify thefluff without effecting gameplaySlide adapted from Bruce Dawson & Chuck Walbourn, Microsoft GameTechnology Group, “Coding for Multiple Cores,” PowerPoint presentation9Typical tasks: Physics?• Could cascade from update to physicsto rendering– Makes use of three threads– May be too much latency• Could run physics on many threads– Uses many threads while doing physics– May leave threads mostly idle elsewhereSlide from Bruce Dawson & Chuck Walbourn, Microsoft GameTechnology Group, “Coding for Multiple Cores,” PowerPoint presentation10Careful with simultaneous multi-threading• Not the same as double the number of cores• Can give a small performance boost…– …if first thread is underutilizing execution resourcesbecause of dependency stalls• Can cause a performance drop– Two threads may fight over L1 cache• Can avoid scheduler latency– Have a thread that is ready to run but OS waits for current“scheduling quantum” to expire before running the thread– Hardware threads can wake up faster; works well if youhave a thread that mostly sleeps but needs to wake quicklyon demandSlide adapted from Bruce Dawson & Chuck Walbourn, Microsoft GameTechnology Group, “Coding for Multiple Cores,” PowerPoint presentation11Rare’s KameoScreenshots from www.rareware.com12Case study: Kameo (1)• Started out as single threaded– Was going to be an original Xbox game, but decided toand make it a 360 launch title• CPU usage split was 51/49 for update/render, sorendering was put on separate thread– Two render-description buffers created tocommunicate from update to render– Linear read/write access for best cache usage– Doesn't copy const dataSlide adapted from Bruce Dawson & Chuck Walbourn, Microsoft GameTechnology Group, “Coding for Multiple Cores,” PowerPoint presentation13Case study: Kameo (2)• Decompression thread:• Saved space on DVD and improved load times• Cost was some spare CPU cycles• Actually two threads for file I/O• One for reading and one for decompressing,because some calls can block for ~0.5s doingdirectory lookups• Multithreading added about six monthsbefore launch - but it worked!Slide adapted from Bruce Dawson & Chuck Walbourn, Microsoft GameTechnology Group, “Coding for Multiple Cores,” PowerPoint presentation14Case Study: Kameo (3)File decompression1XAudio021Rendering01File I/O1Game update00Software threadsThreadCore• Total usage was ~2.2-2.5 cores80-99%80-99%50%Screenshot from www.rareware.comSlide adapted from Bruce Dawson & Chuck Walbourn, Microsoft GameTechnology Group, “Coding for Multiple Cores,” PowerPoint presentation15Bizarre Creations’ Project Gotham Racing 3See http://media.xbox360.gamespy.com/media/741/741362/vids_1.html for movie clipsScreenshot from projectgothamracing3.com/screenshots16Case Study: Project Gotham Racing 31XAudio02Texture decompression1Crowd update, texture decompression01Audio update, networking1Update, physics, rendering, UI00Software threadsThreadCore• Total usage was ~2.0-3.0 coresScreenshot from projectgothamracing3.com/screenshotsSlide adapted from Bruce Dawson & Chuck Walbourn, Microsoft GameTechnology Group, “Coding for Multiple Cores,” PowerPoint presentation17Available synchronization objects• Critical sections (locks)• Events• Semaphores• Mutexes• Don’t suspend threads– Some games have used this for synchronization– Can easily lead to deadlocks– Interacts badly with Visual Studio debuggerSlide adapted from Bruce Dawson & Chuck Walbourn, Microsoft GameTechnology Group, “Coding for Multiple Cores,” PowerPoint presentation18Synchronization tips/costs:• Synchronization is moderately expensivewhen there is no contention– Hundreds to thousands of cycles• Synchronization can be arbitrarilyexpensive when there is contention!• Goals:– Synchronize rarely– Hold locks briefly– Minimize shared dataSlide
View Full Document