........................................................................................................................................................................................................................................................ON-CHIP INTERCONNECTIONARCHITECTURE OF THETILE PROCESSOR........................................................................................................................................................................................................................................................IMESH, THE TILE PROCESSOR ARCHITECTURE’S ON-CHIP INTERCONNECTION NETWORK,CONNECTS THE MULTICORE PROCESSOR’S TILES WITH FIVE2D MESH NETWORKS, EACHSPECIALIZED FOR A DIFFERENT USE. TAKING ADVANTAGE OF THE FIVE NETWORKS, THE C-BASED ILIB INTERCONNECTION LIBRARY EFFICIENTLY MAPS PROGRAM COMMUNICATIONACROSS THE ON-CHIP INTERCONNECT. THE TILE PROCESSOR’S FIRST IMPLEMENTATION,THETILE64, CONTAINS 64 CORES AND CAN EXECUTE 192 BILLION 32-BIT OPERATION S PERSECOND AT1 GHZ.......As the number of processor coresintegrated onto a single die increases, thedesign space for interconnecting these coresbecomes more fertile. One manner ofinterconnecting the cores is simply to mimicmultichip, multiprocessor computers of thepast. Following past practice, simple bus-based shared-memory multiprocessors canbe integrated onto a single piece of silicon.But, in taking this well-traveled route, wesquander the unique opportunities affordedby single-chip integration. Specifically, bus-es require global broadcast and do not scaleto more than about 8 or 16 cores. Somemulticore processors have used 1D rings,but rings do not scale well either, becausetheir bisection bandwidth does not increasewith the addition of more cores.This article describes the Tile Processorand its on-chip interconnect network,iMesh, which is a departure from thetraditional bus-based multicore processor.The Tile Processor is a tiled multicorearchitecture developed by Tilera and in-spired by MIT’s Raw processor.1,2A tiledmulticore architecture is a multiple-instruc-tion, multiple-data (MIMD) machine con-sisting of a 2D grid of homogeneous,general-purpose compute elements, calledcores or tiles. Instead of usin g buses or ringsto connect the many on-chip cores, the TileArchitecture couples its processors using five2D mesh networks, which provide thetransport medium for off-chip memoryaccess, I/O, interrupts, and other commu-nication activity.Having five mesh networks leverages th eon-chip wiring resources to provide massiveon-chip communication bandwidth. Themesh networks afford 1.28 terabits persecond (Tbps) of bandwidth into and outof a sing le tile, and 2.56 Tbps of bisectionbandwidth for an 8 3 8 mesh. By usingmesh networks, the Tile Architecture canDavid WentzlaffPatrick GriffinHenry HoffmannLiewei BaoBruce EdwardsCarl RameyMatthew MattinaChyi-Chang MiaoJohn F. Brown IIIAnant AgarwalTilera0272-1732/07/$20.00G2007 IEEE Published by the IEEE Computer Society.........................................................................15support anywhere from a few to manyprocessors without modifications to thecommunication fabric. In fact, the amountof in-core (tile) communications infrastruc-ture remains constant as the number ofcores grows. Although the in-core resourcesdo not grow as tiles are added, thebandwidth connecting the cores grows withthe number of cores.However, having a massive amount ofinterconnect resources is not sufficient ifthey can’t be effectively utilized. Theinterconnect must be flexible enough toefficiently support many different commu-nication needs and programming models.The Tile Architecture’s interconnect pro-vides communication via shared memoryand direct user-accessible communicationnetworks. The direct user accessible com-munication networks allow for scalar oper-ands, streams of data, and messages to bepassed between tiles without system soft-ware intervention. The iMesh interconnectarchitecture also contains specialized hard-ware to disambiguate flows of dynamicnetwork packets and sort them directly intodistinct processor registers. Hardware dis-ambiguation, register mapping, and directpipeline integration of the dynamic net-works provide register-like intertile com-munication latencies and enable scalaroperand transport on dynamic networks.The interconnect architecture also includesMulticore Hardwall, a mechanism thatprotects one program or operating systemfrom another during use of directly con-nected networks.The Tile Architecture also benefits fromdevoting each of its five separate networksto a different use. By separating the usage ofthe networks and specializing the interfaceto their usage, the architecture allowsefficient mapping of programs with variedrequirements. For example, the Tile Archi-tecture has separate networks for commu-nication with main memory, communica-tion with I/O devices, and user-level scalaroperand and stream communication be-tween tiles. Thus, many applications cansimultaneously pull in their data over theI/O network, access memory over the mem-ory networks, and communicate amongthemselves. This diversity provides a naturalway to utilize additional bandwidth, andseparates traffic to avoid interference.Taking advantage of the huge amount ofbandwidth afforded by the on-chip in-tegration of multiple mesh networks re-quires new programming APIs and a tunedsoftware runtime system. This article alsointroduces iLib, Tilera’s C-based user-levelAPI l ibrary, which provides primitives forstreaming and messaging, much like a light-weight form of the familiar sockets API.iLib maps onto the user-level networkswithout the overhead of system software.Tile Processor Architecture overviewThe Tile Processor Architecture consistsof a 2D grid of identical compute elements,called tiles. Each tile is a powerful, full-featured computing system that can in-dependently run an entire operating system,such as Linux. Likewise, multiple tiles canbe combined to run a multiprocessor oper-ating system such as SMP Linux. Figure 1 isa block diagram of the 64-tile TILE64processor. Figure 2 shows the major com-ponents inside a tile.As Figure 1 shows, the perimeters of themesh networks in a Tile Processor connectto I/O and memory controllers, which inturn connect to the respective off-chip I/Odevices and DRAMs through the chip’spins. Each tile combines a processor and itsassociated cache hierarchy with a switch,which implements the Tile Processor’svarious
View Full Document