DOC PREVIEW
UH COSC 6385 - Multi-Processors (II) Simultaneous multi-threading and multi-core processors

This preview shows page 1-2-3-4-5 out of 15 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1Edgar GabrielCOSC 6385 Computer Architecture - Multi-Processors (II)Simultaneous multi-threading and multi-core processors Edgar GabrielFall 2009COSC 6385 – Computer ArchitectureEdgar GabrielMoore’s LawSource: http://en.wikipedia.org/wki/Images:Moores_law.svg• Long-term trend on the number of transistor per integrated circuit• Number of transistors double every ~18 month2COSC 6385 – Computer ArchitectureEdgar GabrielWhat do we do with that many transistors? • Optimizing the execution of a single instruction stream through– Pipelining• Overlap the execution of multiple instructions• Example: all RISC architectures; Intel x86 underneath the hood– Out-of-order execution: • Allow instructions to overtake each other in accordance with code dependencies (RAW, WAW, WAR)• Example: all commercial processors (Intel, AMD, IBM, SUN)– Branch prediction and speculative execution: • Reduce the number of stall cycles due to unresolved branches• Example: all commercial processors (Intel, AMD, IBM, SUN)COSC 6385 – Computer ArchitectureEdgar GabrielWhat do we do with that many transistors? (II)– Multi-issue processors: • Allow multiple instructions to start execution per clock cycle• Superscalar (Intel x86, AMD, …) vs. VLIW architectures– VLIW/EPIC architectures: • Allow compilers to indicate independent instructions per issue packet• Example: Intel Itanium series– Vector units:• Allow for the efficient expression and execution of vector operations• Example: SSE, SSE2, SSE3, instructions3COSC 6385 – Computer ArchitectureEdgar GabrielLimitations of optimizing a single instruction stream (II)• Problem: within a single instruction stream we do not find enough independent instructions to execute simultaneously due to– data dependencies– limitations of speculative execution across multiple branches– difficulties to detect memory dependencies among instruction (alias analysis)• Consequence: significant number of functional units are idling at any given time • Question: Can we maybe execute instructions from another instructions stream – Another thread?– Another process?COSC 6385 – Computer ArchitectureEdgar GabrielThread-level parallelism• Problems for executing instructions from multiple threads at the same time– The instructions in each thread might use the same register names– Each thread has its own program counter• Virtual memory management allows for the execution of multiple threads and sharing of the main memory• When to switch between different threads:– Fine grain multithreading: switches between every instruction– Course grain multithreading: switches only on costly stalls (e.g. level 2 cache misses)4COSC 6385 – Computer ArchitectureEdgar GabrielSimultaneous Multi-Threading (SMT)• Convert Thread-level parallelism to instruction-level parallelismSuperscalarCourse MT Fine MTSMTCOSC 6385 – Computer ArchitectureEdgar GabrielSimultaneous multi-threading (II)• Dynamically scheduled processors already have most hardware mechanisms in place to support SMT (e.g. register renaming)• Required additional hardware:– Registerfile per thread– Program counter per thread• Operating system view:– If a CPU supports n simultaneous threads, the Operating System views them as n processors– OS distributes most time consuming threads ‘fairly’ across the n processors that it sees.5COSC 6385 – Computer ArchitectureEdgar GabrielExample for SMT architectures (I)• Intel Hyperthreading:– First released for Intel Xeon processor family in 2002– Supports two architectural sets per CPU, – Each architectural set has its own• General purpose registers• Control registers• Interrupt control registers• Machine state registers– Adds less than 5% to the relative chip sizeReference: D.T. Marr et. al. “Hyper-Threading Technology Architecture and Microarchitecture”, Intel Technology Journal, 6(1), 2002, pp.4-15. ftp://download.intel.com/technology/itj/2002/volume06issue01/vol6iss1_hyper_threading_technology.pdfCOSC 6385 – Computer ArchitectureEdgar GabrielExample for SMT architectures (II)• IBM Power 5– Same pipeline as IBM Power 4 processor but with SMT support– Further improvements:• Increase associativity of the L1 instruction cache• Increase the size of the L2 and L3 caches• Add separate instruction prefetch and buffering units for each SMT• Increase the size of issue queues• Increase the number of virtual registers used internally by the processor.6COSC 6385 – Computer ArchitectureEdgar GabrielSimultaneous Multi-Threading• Works well if– Number of compute intensive threads does not exceed the number of threads supported in SMT– Threads have highly different characteristics (e.g. one thread doing mostly integer operations, another mainly doing floating point operations)• Does not work well if– Threads try to utilize the same function units– Assignment problems: • e.g. a dual processor system, each processor supporting 2 threads simultaneously (OS thinks there are 4 processors)• 2 compute intensive application processes might end up on the same processor instead of different processors (OS does not see the difference between SMT and real processors!)COSC 6385 – Computer ArchitectureEdgar GabrielMulti-Core processors• Next step in the evolution of SMT: replicate not just the architectural state, but also the functional units• Compute cores on a multi-core processor share the same main memory -> SMP system!• Difference to previous multi-processor systems:– compute cores are on the same chip– Multi-core processors typically connected over a cache, while previous SMP systems were typically connected over the main memory• Performance implications • Cache coherence protocol7COSC 6385 – Computer ArchitectureEdgar GabrielMulti-core processors: Example (I)• Intel X7350 quad-core (Tigerton)– Private L1 cache: 32 KB instruction, 32 KB data – Shared L2 cache: 4 MB unified cacheCoreL1CoreL1shared L2CoreL1CoreL1shared L21066 MHz FSBCOSC 6385 – Computer ArchitectureEdgar GabrielMulti-core processors: Example (I)• Intel X7350 quad-core (Tigerton) multi-processor configurationC0C1L2C8C9L2C2C3L2C10C11L2C4C5L2C12C13L2C6C7L2C14C15L2Socket 0 Socket 1 Socket 2 Socket 3Memory ControllerHub (MCH)Memory Memory Memory Memory8 GB/s8 GB/s8 GB/s8 GB/s8COSC 6385 – Computer ArchitectureEdgar GabrielMulti-core processors: Example (II)• AMD 8350 quad-core Opteron


View Full Document

UH COSC 6385 - Multi-Processors (II) Simultaneous multi-threading and multi-core processors

Download Multi-Processors (II) Simultaneous multi-threading and multi-core processors
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Multi-Processors (II) Simultaneous multi-threading and multi-core processors and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Multi-Processors (II) Simultaneous multi-threading and multi-core processors 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?