Berkeley COMPSCI 258 - ON-CHIP INTERCONNECTION ARCHITECTURE OF THE TILE PROCESSOR

Unformatted text preview:

........................................................................................................................................................................................................................................................ON-CHIP INTERCONNECTIONARCHITECTURE OF THETILE PROCESSOR........................................................................................................................................................................................................................................................IMESH, THE TILE PROCESSOR ARCHITECTURE’S ON-CHIP INTERCONNECTION NETWORK,CONNECTS THE MULTICORE PROCESSOR’S TILES WITH FIVE2D MESH NETWORKS, EACHSPECIALIZED FOR A DIFFERENT USE. TAKING ADVANTAGE OF THE FIVE NETWORKS, THE C-BASED ILIB INTERCONNECTION LIBRARY EFFICIENTLY MAPS PROGRAM COMMUNICATIONACROSS THE ON-CHIP INTERCONNECT. THE TILE PROCESSOR’S FIRST IMPLEMENTATION,THETILE64, CONTAINS 64 CORES AND CAN EXECUTE 192 BILLION 32-BIT OPERATION S PERSECOND AT1 GHZ.......As the number of processor coresintegrated onto a single die increases, thedesign space for interconnecting these coresbecomes more fertile. One manner ofinterconnecting the cores is simply to mimicmultichip, multiprocessor computers of thepast. Following past practice, simple bus-based shared-memory multiprocessors canbe integrated onto a single piece of silicon.But, in taking this well-traveled route, wesquander the unique opportunities affordedby single-chip integration. Specifically, bus-es require global broadcast and do not scaleto more than about 8 or 16 cores. Somemulticore processors have used 1D rings,but rings do not scale well either, becausetheir bisection bandwidth does not increasewith the addition of more cores.This article describes the Tile Processorand its on-chip interconnect network,iMesh, which is a departure from thetraditional bus-based multicore processor.The Tile Processor is a tiled multicorearchitecture developed by Tilera and in-spired by MIT’s Raw processor.1,2A tiledmulticore architecture is a multiple-instruc-tion, multiple-data (MIMD) machine con-sisting of a 2D grid of homogeneous,general-purpose compute elements, calledcores or tiles. Instead of usin g buses or ringsto connect the many on-chip cores, the TileArchitecture couples its processors using five2D mesh networks, which provide thetransport medium for off-chip memoryaccess, I/O, interrupts, and other commu-nication activity.Having five mesh networks leverages th eon-chip wiring resources to provide massiveon-chip communication bandwidth. Themesh networks afford 1.28 terabits persecond (Tbps) of bandwidth into and outof a sing le tile, and 2.56 Tbps of bisectionbandwidth for an 8 3 8 mesh. By usingmesh networks, the Tile Architecture canDavid WentzlaffPatrick GriffinHenry HoffmannLiewei BaoBruce EdwardsCarl RameyMatthew MattinaChyi-Chang MiaoJohn F. Brown IIIAnant AgarwalTilera0272-1732/07/$20.00G2007 IEEE Published by the IEEE Computer Society.........................................................................15support anywhere from a few to manyprocessors without modifications to thecommunication fabric. In fact, the amountof in-core (tile) communications infrastruc-ture remains constant as the number ofcores grows. Although the in-core resourcesdo not grow as tiles are added, thebandwidth connecting the cores grows withthe number of cores.However, having a massive amount ofinterconnect resources is not sufficient ifthey can’t be effectively utilized. Theinterconnect must be flexible enough toefficiently support many different commu-nication needs and programming models.The Tile Architecture’s interconnect pro-vides communication via shared memoryand direct user-accessible communicationnetworks. The direct user accessible com-munication networks allow for scalar oper-ands, streams of data, and messages to bepassed between tiles without system soft-ware intervention. The iMesh interconnectarchitecture also contains specialized hard-ware to disambiguate flows of dynamicnetwork packets and sort them directly intodistinct processor registers. Hardware dis-ambiguation, register mapping, and directpipeline integration of the dynamic net-works provide register-like intertile com-munication latencies and enable scalaroperand transport on dynamic networks.The interconnect architecture also includesMulticore Hardwall, a mechanism thatprotects one program or operating systemfrom another during use of directly con-nected networks.The Tile Architecture also benefits fromdevoting each of its five separate networksto a different use. By separating the usage ofthe networks and specializing the interfaceto their usage, the architecture allowsefficient mapping of programs with variedrequirements. For example, the Tile Archi-tecture has separate networks for commu-nication with main memory, communica-tion with I/O devices, and user-level scalaroperand and stream communication be-tween tiles. Thus, many applications cansimultaneously pull in their data over theI/O network, access memory over the mem-ory networks, and communicate amongthemselves. This diversity provides a naturalway to utilize additional bandwidth, andseparates traffic to avoid interference.Taking advantage of the huge amount ofbandwidth afforded by the on-chip in-tegration of multiple mesh networks re-quires new programming APIs and a tunedsoftware runtime system. This article alsointroduces iLib, Tilera’s C-based user-levelAPI l ibrary, which provides primitives forstreaming and messaging, much like a light-weight form of the familiar sockets API.iLib maps onto the user-level networkswithout the overhead of system software.Tile Processor Architecture overviewThe Tile Processor Architecture consistsof a 2D grid of identical compute elements,called tiles. Each tile is a powerful, full-featured computing system that can in-dependently run an entire operating system,such as Linux. Likewise, multiple tiles canbe combined to run a multiprocessor oper-ating system such as SMP Linux. Figure 1 isa block diagram of the 64-tile TILE64processor. Figure 2 shows the major com-ponents inside a tile.As Figure 1 shows, the perimeters of themesh networks in a Tile Processor connectto I/O and memory controllers, which inturn connect to the respective off-chip I/Odevices and DRAMs through the chip’spins. Each tile combines a processor and itsassociated cache hierarchy with a switch,which implements the Tile Processor’svarious


View Full Document

Berkeley COMPSCI 258 - ON-CHIP INTERCONNECTION ARCHITECTURE OF THE TILE PROCESSOR

Documents in this Course
Load more
Download ON-CHIP INTERCONNECTION ARCHITECTURE OF THE TILE PROCESSOR
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view ON-CHIP INTERCONNECTION ARCHITECTURE OF THE TILE PROCESSOR and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view ON-CHIP INTERCONNECTION ARCHITECTURE OF THE TILE PROCESSOR 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?