DOC PREVIEW
Berkeley COMPSCI 152 - Lecture 23 – Synchronization

This preview shows page 1-2-3-24-25-26 out of 26 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

UC Regents Fall 2005 © UCBCS 152 L23: Synchronization2005-11-17John Lazzaro (www.cs.berkeley.edu/~lazzaro)CS 152 Computer Architecture and EngineeringLecture 23 – Synchronizationwww-inst.eecs.berkeley.edu/~cs152/TAs: David Marquardt and Udam SainiUC Regents Fall 2005 © UCBCS 152 L23: SynchronizationLast Time: How Routers Work238 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 6, NO. 3, JUNE 1998Fig. 1. MGR outline.A. Design SummaryA simplified outline of the MGR design is shown in Fig. 1,which illustrates the data processing path for a stream ofpackets entering from the line card on the left and exitingfrom the line card on the right.The MGR consists of multiple line cards (each supportingone or more network interfaces) and forwarding engine cards,all plugged into a high-speed switch. When a packet arrivesat a line card, its header is removed and passed through theswitch to a forwarding engine. (The remainder of the packetremains on the inbound line card). The forwarding enginereads the header to determine how to forward the packet andthen updates the header and sends the updated header andits forwarding instructions back to the inbound line card. Theinbound line card integrates the new header with the rest ofthe packet and sends the entire packet to the outbound linecard for transmission.Not shown in Fig. 1 but an important piece of the MGRis a control processor, called the network processor, thatprovides basic management functions such as link up/downmanagement and generation of forwarding engine routingtables for the router.B. Major InnovationsThere are five novel elements of this design. This sectionbriefly presents the innovations. More detailed discussions,when needed, can be found in the sections following.First, each forwarding engine has a complete set of therouting tables. Historically, routers have kept a central masterrouting table and the satellite processors each keep only amodest cache of recently used routes. If a route was not in asatellite processor’s cache, it would request the relevant routefrom the central table. At high speeds, the central table caneasily become a bottleneck because the cost of retrieving aroute from the central table is many times (as much as 1000times) more expensive than actually processing the packetheader. So the solution is to push the routing tables downinto each forwarding engine. Since the forwarding enginesonly require a summary of the data in the route (in particular,next hop information), their copies of the routing table, calledforwarding tables, can be very small (as little as 100 kB forabout 50k routes [6]).Second, the design uses a switched backplane. Until veryrecently, the standard router used a shared bus rather thana switched backplane. However, to go fast, one really needsthe parallelism of a switch. Our particular switch was customdesigned to meet the needs of an Internet protocol (IP) router.Third, the design places forwarding engines on boardsdistinct from line cards. Historically, forwarding processorshave been placed on the line cards. We chose to separate themfor several reasons. One reason was expediency; we were notsure if we had enough board real estate to fit both forwardingengine functionality and line card functions on the targetcard size. Another set of reasons involves flexibility. Thereare well-known industry cases of router designers cripplingtheir routers by putting too weak a processor on the linecard, and effectively throttling the line card’s interfaces tothe processor’s speed. Rather than risk this mistake, we builtthe fastest forwarding engine we could and allowed as many(or few) interfaces as is appropriate to share the use of theforwarding engine. This decision had the additional benefit ofmaking support for virtual private networks very simple—wecan dedicate a forwarding engine to each virtual network andensure that packets never cross (and risk confusion) in theforwarding path.Placing forwarding engines on separate cards led to a fourthinnovation. Because the forwarding engines are separate fromthe line cards, they may receive packets from line cards that2. Forwarding engine determines the next hop for the packet, and returns next-hop data to the line card, together with an updated header.2.2.UC Regents Fall 2005 © UCBCS 152 L23: SynchronizationRecall: Two CPUs sharing memorysupports a 1.875-Mbyte on-chip L2 cache.Power4 and Power4+ systems both have 32-Mbyte L3 caches, whereas Power5 systemshave a 36-Mbyte L3 cache.The L3 cache operates as a backdoor withseparate buses for reads and writes that oper-ate at half processor speed. In Power4 andPower4+ systems, the L3 was an inline cachefor data retrieved from memory. Because ofthe higher transistor density of the Power5’s130-nm technology, we could move the mem-ory controller on chip and eliminate a chippreviously needed for the memory controllerfunction. These two changes in the Power5also have the significant side benefits of reduc-ing latency to the L3 cache and main memo-ry, as well as reducing the number of chipsnecessary to build a system.Chip overviewFigure 2 shows the Power5 chip, whichIBM fabricates using silicon-on-insulator(SOI) devices and copper interconnect. SOItechnology reduces device capacitance toincrease transistor performance.5Copperinterconnect decreases wire resistance andreduces delays in wire-dominated chip-tim-ing paths. In 130 nm lithography, the chipuses eight metal levels and measures 389 mm2.The Power5 processor supports the 64-bitPowerPC architecture. A single die containstwo identical processor cores, each supportingtwo logical threads. This architecture makesthe chip appear as a four-way symmetric mul-tiprocessor to the operating system. The twocores share a 1.875-Mbyte (1,920-Kbyte) L2cache. We implemented the L2 cache as threeidentical slices with separate controllers foreach. The L2 slices are 10-way set-associativewith 512 congruence classes of 128-byte lines.The data’s real address determines which L2slice the data is cached in. Either processor corecan independently access each L2 controller.We also integrated the directory for an off-chip 36-Mbyte L3 cache on the Power5 chip.Having the L3 cache directory on chip allowsthe processor to check the directory after anL2 miss without experiencing off-chip delays.To reduce memory latencies, we integratedthe memory controller on the chip. This elim-inates driver and receiver delays to an exter-nal controller.Processor coreWe designed the Power5


View Full Document

Berkeley COMPSCI 152 - Lecture 23 – Synchronization

Documents in this Course
Quiz 5

Quiz 5

9 pages

Memory

Memory

29 pages

Quiz 5

Quiz 5

15 pages

Memory

Memory

29 pages

Memory

Memory

35 pages

Memory

Memory

15 pages

Quiz

Quiz

6 pages

Midterm 1

Midterm 1

20 pages

Quiz

Quiz

12 pages

Memory

Memory

33 pages

Quiz

Quiz

6 pages

Homework

Homework

19 pages

Quiz

Quiz

5 pages

Memory

Memory

15 pages

Load more
Download Lecture 23 – Synchronization
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture 23 – Synchronization and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 23 – Synchronization 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?