DOC PREVIEW
OSU CS 419 - Intel’s Larrabee

This preview shows page 1-2-3 out of 8 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1Intel’s LarrabeeMike BaileyOregon State Universitymjb – November 25, 2009Oregon State UniversityComputer GraphicsCredit: Many of the figures in these notes came from Larry Seiler’s Larrabee Hardware Architecture presentationReaching the Promised LandNVIDIA GPUsCUDALarrabeeSpeedLarrabeemjb – November 25, 2009Oregon State UniversityComputer GraphicsGeneral ProgrammabilityIntel CPUs2Larrabee Characteristics• x-86 instruction set– Compatible with all other Intel CPU chips– Have extra instructions for vector units (more on this later)• 32 cores(f–Each core can have 4 threads per core (i.e., 4 sets of independent register state per core)– Each core has its own L1 and L2 cache– Each core has a vector unit– There are new instructions for vector units (more on this later)• L2 Caches–Fully coherent among the coresmjb – November 25, 2009Oregon State UniversityComputer GraphicsFully coherent among the cores• Intel considers Larrabee a GPU and is marketing it as such– But, with much more convenient programmability than other GPUs– Doesn’t have much fixed-function graphics hardware (texture sampling is about it)– Will do most of the graphics pipeline in multicore, vectored softwareCache Behavior• Independent 32K L1 Instruction and 32K L1 Data caches per core– 8-way, 64 Bytes/cache line– 64 sets of 8 lines per set– Non-Blocking: If 1 thread has a cache miss, other threads keep going• 256K L2 Cache per Core– 4096 lines, 512 sets, 8-way, 64 Bytes/cache line– Architectural Central Tag Directory for Coherence• Makes programming easier because hardware ensures code always receives the most recent revision of data• The caches are connected with a Ring Busmjb – November 25, 2009Oregon State UniversityComputer Graphics–512 bits in each direction3Larrabee Block Diagrammjb – November 25, 2009Oregon State UniversityComputer GraphicsTwo Kinds of Parallelism• Two kinds of parallelism:–Vectors:• Good when there is tight synchronization• Bad when data being processed follows different paths– Threads• Bad when there is tight synchronization• Fine when data being processed follows different pathsmjb – November 25, 2009Oregon State UniversityComputer Graphics4The CUDA ParadigmC++ Program with both host and CUDA code in itCompiler and LinkerCompiler and LinkerHost code CUDA codeCPU binary on the hostCUDA binary on the GPUmjb – November 25, 2009Oregon State UniversityComputer GraphicsThe Larrabee ParadigmC++ Program with both host and LRB code in itCompiler and LinkerCompiler and LinkerHost code LRB codeCPU binary on the hostLRB binary on the GPUmjb – November 25, 2009Oregon State UniversityComputer Graphics5Good Behavior When Amdahl’s Law is Working for YouPerfect linear speedup linemjb – November 25, 2009Oregon State UniversityComputer GraphicsWarning: this is all based on a software simulation of the hardware!Larrabee and OSU• We have a research project with Intel, and are using a Larrabee system remotely (one of only a handful of universities that have this access) • We will be getting 4 or more Larrabee systems here next year• Might be offering a Larrabee course in Fall 2010?mjb – November 25, 2009Oregon State UniversityComputer Graphics6Vector Unit• Operates on 512 bits (=16 floating point numbers) at a time – All 16 operations happen in one clock– Larrabee’s 32 cores can compute 512 floating point operations per clock– When fully kept busy, will produce a throughput of 1.5 teraflopsCl ili ti lldi t d f i th bl l–C language inline routines called instead of using the assembly languageSimple vector multiply:vmulps v0, v1, v2 ; v0 = v1 * v2; v’s can be registers or memory locationsMultiply-adddestination can be the thirdsource:Examples:mjb – November 25, 2009Oregon State UniversityComputer GraphicsMultiply-add, destination can be the third source:vmadd231ps v0, v1, v2 ; v0 = v1 * v2 + v0Mask the writing of the elements:vmulps v0 {k1}, v1, v2 ; only some of the result is written to v0Realtime Ray-tracing (?)mjb – November 25, 2009Oregon State UniversityComputer Graphics7Expected Ray-trace Performancemjb – November 25, 2009Oregon State UniversityComputer GraphicsLarrabee Native Multithreading• pthreads and OpenMP are supported• Larrabee’s multitask library, called XNTask, is the “native” multithreading API• Arrange tasks in a dependency graph• When a node’s required inputs are all available, that node can run• Can support DLP, TLP, or the pipeline pattern• Nodes can have different prioritiesmjb – November 25, 2009Oregon State UniversityComputer Graphics8Some Referenceshttp://en.wikipedia.org/wiki/Larrabee_(GPU)Sil LCDtlLbA86 hit t fil tiSeiler, L., Carmean, D., et al., Larrabee: A many-core x86 architecture for visual computing. SIGGRAPH 2008 Conference Proceedings, August 2008.A First Look at the Larrabee New Instructions: http://www.ddj.com/architect/216402188Rasterization on Larrabee: http://www.ddj.com/architect/217200602Game Physics Performance on the Larrabee Architecture: http://download.intel.com/technology/architecture-silicon/GamePhysicsOnLarrabee_paper.pdfmjb – November 25, 2009Oregon State UniversityComputer


View Full Document

OSU CS 419 - Intel’s Larrabee

Download Intel’s Larrabee
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Intel’s Larrabee and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Intel’s Larrabee 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?