Multi-core architecturesSingle-core computerSingle-core CPU chipMulti-core architecturesMulti-core CPU chipThe cores run in parallelWithin each core, threads are time-sliced (just like on a uniprocessor)Interaction with OSWhy multi-core ?Instruction-level parallelismThread-level parallelism (TLP)General context: MultiprocessorsMultiprocessor memory typesMulti-core processor is a special kind of a multiprocessor:All processors are on the same chipWhat applications benefit from multi-core?More examplesA technique complementary to multi-core:Simultaneous multithreadingSimultaneous multithreading (SMT)Without SMT, only a single thread can run at any given timeWithout SMT, only a single thread can run at any given timeSMT processor: both threads can run concurrentlyBut: Can’t simultaneously use the same functional unitSMT not a “true” parallel processorMulti-core: threads can run on separate coresMulti-core: threads can run on separate coresCombining Multi-core and SMTSMT Dual-core: all four threads can run concurrentlyComparison: multi-core vs SMTComparison: multi-core vs SMTThe memory hierarchy“Fish” machinesDesigns with private L2 cachesPrivate vs shared caches?Private vs shared cachesThe cache coherence problemThe cache coherence problemSolutions for cache coherenceInter-core busInvalidation protocol with snoopingProgramming for multi-coreThread safety very importantHowever: Need to use synchronization even if only time-slicing on a uniprocessorNeed to use synchronization even if only time-slicing on a uniprocessorAssigning threads to the coresAffinity masks are bit vectorsAffinity masks when multi-core and SMT combinedDefault AffinitiesProcess migration is costlyHard affinitiesWhen to set your own affinitiesKernel scheduler APIKernel scheduler APIWindows Task ManagerLegal licensing issuesConclusion1Multi-core architecturesJernej Barbic15-213, Spring 2006May 4, 20062Single-core computer3Single-core CPU chipthe single core4Multi-core architectures• This lecture is about a new trend in computer architecture:Replicate multiple processor cores on a single die.Core 1 Core 2 Core 3 Core 4Multi-core CPU chip5Multi-core CPU chip• The cores fit on a single processor socket• Also called CMP (Chip Multi-Processor)core1core2core3core46The cores run in parallelcore1core2core3core4thread 1 thread 2 thread 3 thread 47Within each core, threads are time-sliced (just like on a uniprocessor)core1core2core3core4several threadsseveral threadsseveral threadsseveral threads8Interaction with OS• OS perceives each core as a separate processor• OS scheduler maps threads/processes to different cores• Most major OS support multi-core today9Why multi-core ?• Difficult to make single-coreclock frequencies even higher • Deeply pipelined circuits:– heat problems– speed of light problems– difficult design and verification– large design teams necessary– server farms need expensiveair-conditioning• Many new applications are multithreaded • General trend in computer architecture (shift towards more parallelism)10Instruction-level parallelism• Parallelism at the machine-instruction level• The processor can re-order, pipeline instructions, split them into microinstructions, do aggressive branch prediction, etc.• Instruction-level parallelism enabled rapid increases in processor speeds over the last 15 years11Thread-level parallelism (TLP)• This is parallelism on a more coarser scale• Server can serve each client in a separate thread (Web server, database server)• A computer game can do AI, graphics, and physics in three separate threads• Single-core superscalar processors cannot fully exploit TLP• Multi-core architectures are the next step in processor evolution: explicitly exploiting TLP12General context: Multiprocessors• Multiprocessor is any computer with several processors•SIMD– Single instruction, multiple data– Modern graphics cards•MIMD– Multiple instructions, multiple dataLemieux cluster,Pittsburgh supercomputing center13Multiprocessor memory types• Shared memory:In this model, there is one (large) common shared memory for all processors• Distributed memory:In this model, each processor has its own (small) local memory, and its content is not replicated anywhere else14Multi-core processor is a special kind of a multiprocessor:All processors are on the same chip• Multi-core processors are MIMD:Different cores execute different threads (Multiple Instructions), operating on different parts of memory (Multiple Data).• Multi-core is a shared memory multiprocessor:All cores share the same memory15What applications benefit from multi-core?• Database servers• Web servers (Web commerce)• Compilers• Multimedia applications• Scientific applications, CAD/CAM• In general, applications with Thread-level parallelism(as opposed to instruction-level parallelism)Each can run on itsown core16More examples• Editing a photo while recording a TV show through a digital video recorder• Downloading software while running an anti-virus program • “Anything that can be threaded today will map efficiently to multi-core”• BUT: some applications difficult toparallelize17A technique complementary to multi-core:Simultaneous multithreading• Problem addressed:The processor pipeline can get stalled:– Waiting for the result of a long floating point (or integer) operation– Waiting for data to arrive from memory Other execution unitswait unusedBTB and I-TLBDecoderTrace CacheRename/AllocUop queuesSchedulersInteger Floating PointL1 D-Cache D-TLBuCodeROMBTBL2 Cache and ControlBusSource: Intel18Simultaneous multithreading (SMT)• Permits multiple independent threads to execute SIMULTANEOUSLY on the SAME core• Weaving together multiple “threads” on the same core• Example: if one thread is waiting for a floating point operation to complete, another thread can use the integer units19BTB and I-TLBDecoderTrace CacheRename/AllocUop queuesSchedulersInteger Floating PointL1 D-Cache D-TLBuCode ROMBTBL2 Cache and ControlBusThread 1: floating pointWithout SMT, only a single thread can run at any given time20Without SMT, only a single thread can run at any given timeBTB and I-TLBDecoderTrace CacheRename/AllocUop queuesSchedulersInteger Floating PointL1 D-Cache D-TLBuCode ROMBTBL2 Cache and ControlBusThread 2:integer operation21SMT processor: both threads can run concurrentlyBTB and I-TLBDecoderTrace CacheRename/AllocUop queuesSchedulersInteger
View Full Document