DOC PREVIEW
OSU CS 419 - STUDY NOTES

This preview shows page 1-2-3-4-5 out of 15 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Intel’s LarrabeeMike BaileyOregon State UniversityCditMfth fi i th t f L S il ’LbHd Ahittttimjb – November 25, 2009Oregon State UniversityComputer GraphicsCredit: Many of the figures in these notes came from Larry Seiler’s Larrabee Hardware Architecture presentationReaching the Promised LandCUDAdNVIDIA GPUsCUDALarrabeeSpeedIntel CPUsGeneral Programmabilitymjb – November 25, 2009Oregon State UniversityComputer GraphicsGeneral ProgrammabilityLarrabee Characteristics•x86 instruction set•x-86 instruction set– Compatible with all other Intel CPU chips– Have extra instructions for vector units (more on this later)• 32 cores– Each core can have 4 threads per core (i.e., 4 sets of independent register state per core)–Each core has its own L1 and L2 cacheEach core has its own L1 and L2 cache– Each core has a vector unit– There are new instructions for vector units (more on this later)• L2 Caches– Fully coherent among the cores• Intel considers Larrabee a GPU and is marketing it as such– But, with much more convenient programmability than other GPUs– Doesn’t have much fixed-function graphics hardware (texture sampling is about it)mjb – November 25, 2009Oregon State UniversityComputer Graphicssampling is about it)– Will do most of the graphics pipeline in multicore, vectored softwareCache Behavior• Independent 32K L1 Instruction and 32K L1 Data caches per core– 8-way, 64 Bytes/cache line– 64 sets of 8 lines per set–Non-Blocking: If 1 thread has a cache miss, other threads keep going• 256K L2 Cache per Core–4096 lines 512 sets 8-way 64 Bytes/cache line4096 lines, 512 sets, 8way, 64 Bytes/cache line– Architectural Central Tag Directory for Coherence• Makes programming easier because hardware ensures code always receives the most recent revision of data• The caches are connected with a Ring Bus– 512 bits in each directionmjb – November 25, 2009Oregon State UniversityComputer GraphicsLarrabee Block Diagrammjb – November 25, 2009Oregon State UniversityComputer GraphicsTwo Kinds of Parallelism• Two kinds of parallelism:p–Vectors:• Good when there is tight synchronization• Bad when data being processed follows different pathsgp p– Threads• Bad when there is tight synchronization•Fine when data being processed follows differentpathsFine when data being processed follows different pathsmjb – November 25, 2009Oregon State UniversityComputer GraphicsThe CUDA ParadigmC++ Program with both host and CUDA code in itCompiler Compiler Host code CUDA codepand LinkerCPU binary on CUDA binary pand LinkerCUbayothe hostCU b a yon the GPUmjb – November 25, 2009Oregon State UniversityComputer GraphicsThe Larrabee ParadigmC++ Program with both host and LRB code in itCompiler Compiler Host code LRB codepand LinkerCPU binary on LRB binary on pand LinkerCUbayothe hostbayothe GPUmjb – November 25, 2009Oregon State UniversityComputer GraphicsGood Behavior When Amdahl’s Law is Working for YouPerfect linear speedup lineWarning: this is all based on a Warning: this is all based on a software simulation of the hardware!mjb – November 25, 2009Oregon State UniversityComputer GraphicsLarrabee and OSU• We have a research project with Intel, and are using a Larrabee system pj , g yremotely (one of only a handful of universities that have this access) • We will be getting 4 or more Larrabee systems here next year• Might be offering a Larrabee course in Fall 2010?mjb – November 25, 2009Oregon State UniversityComputer GraphicsVector UnitOt512 bit(16fl ti it b ) tti•Operates on 512 bits (=16 floating point numbers) at a time – All 16 operations happen in one clock– Larrabee’s 32 cores can compute 512 floating point operations per clockWh f ll k t b ill d th h t f 1 5 t fl–When fully kept busy, will produce a throughput of 1.5 teraflops– C language inline routines called instead of using the assembly languageExamples:Simple vector multiply:vmulps v0, v1, v2 ; v0 = v1 * v2;v’scan be registers or memory locationsapes; vscan be registers or memory locationsMultiply-add, destination can be the third source:vmadd231ps v0, v1, v2 ; v0 = v1 * v2 + v0Mask the writing of the elements:mjb – November 25, 2009Oregon State UniversityComputer Graphicsvmulps v0 {k1}, v1, v2 ; only some of the result is written to v0Realtime Ray-tracing (?)mjb – November 25, 2009Oregon State UniversityComputer GraphicsExpected Ray-trace Performancemjb – November 25, 2009Oregon State UniversityComputer GraphicsLarrabee Native Multithreading•pthreadsandOpenMPare supportedpthreadsand OpenMPare supported• Larrabee’s multitask library, called XNTask, is the “native” multithreading API•Arrange tasks in a dependency graph• When a node’s required inputs are all available, that node can run• Can support DLP, TLP, or the pipeline pattern• Nodes can have different prioritiesmjb – November 25, 2009Oregon State UniversityComputer GraphicsSome Referenceshttp://en.wikipedia.org/wiki/Larrabee_(GPU)ppg _()Seiler, L., Carmean, D., et al., Larrabee: A many-core x86 architecture for visual computing. SIGGRAPH 2008 Conference Proceedings, August 2008.A First Look at the Larrabee New Instructions: http://www.ddj.com/architect/216402188Rasterization on Larrabee: http://www.ddj.com/architect/217200602Game Physics Performance on the Larrabee Architecture: http://download.intel.com/technology/architecture-silicon/GamePhysicsOnLarrabee_paper.pdfmjb – November 25, 2009Oregon State UniversityComputer


View Full Document

OSU CS 419 - STUDY NOTES

Download STUDY NOTES
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view STUDY NOTES and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view STUDY NOTES 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?