11/28/01 ©UCB Fall 2001CS152 / Kubiatowicz Lec24.1CS152Computer Architecture and EngineeringLecture 24Busses (continued)Queueing TheoryDisk IONovember 28, 2001John Kubiatowicz (http.cs.berkeley.edu/~kubitron)lecture slides: http://www-inst.eecs.berkeley.edu/~cs152/11/28/01 ©UCB Fall 2001CS152 / Kubiatowicz Lec24.2Recap: Making address translation practical: TLB° Virtual memory => memory acts like a cache for the disk° Page table maps virtual page numbers to physical frames° Translation Look-aside Buffer (TLB) is a cache translationsPhysicalMemory SpaceVirtualAddress SpaceTLBPage Table2013virtual addresspageoff2frame page250physical addresspageoff11/28/01 ©UCB Fall 2001CS152 / Kubiatowicz Lec24.3TLB4K Cache10 2004 bytesindex1 Kpage # disp20assoclookup32Hit/MissFNDataHit/Miss=FNWhat if cache size is increased to 8KB?° If we do this in parallel, we have to be careful, however:Recap: Overlapped TLB & Cache Access11/28/01 ©UCB Fall 2001CS152 / Kubiatowicz Lec24.4Recap: A Three-Bus System (+ backside cache)° A small number of backplane buses tap into the processor-memory bus• Processor-memory bus is only used for processor-memory traffic• I/O buses are connected to the backplane bus° Advantage: loading on the processor bus is greatly reducedProcessor MemoryProcessor Memory BusBusAdaptorBusAdaptorBusAdaptorI/O BusBacksideCache busI/O BusL2 Cache11/28/01 ©UCB Fall 2001CS152 / Kubiatowicz Lec24.5Recap: Main components of Intel Chipset: Pentium II/III° Northbridge:• Handles memory• Graphics° Southbridge: I/O• PCI bus• Disk controllers• USB controlers• Audio• Serial I/O• Interrupt controller• Timers11/28/01 ©UCB Fall 2001CS152 / Kubiatowicz Lec24.6° Synchronous Bus:• Includes a clock in the control lines• A fixed protocol relative to the clock• Advantage: little logic and very fast• Disadvantages:- Every device on the bus must run at the same clock rate- To avoid clock skew, they cannot be long if they are fast° Asynchronous Bus:• It is not clocked• It can accommodate a wide range of devices• It can be lengthened without worrying about clock skew• It requires a handshaking protocolRecap: Synchronous and Asynchronous Bus11/28/01 ©UCB Fall 2001CS152 / Kubiatowicz Lec24.7Multiple Potential Bus Masters: the Need for Arbitration° Bus arbitration scheme:• A bus master wanting to use the bus asserts the bus request• A bus master cannot use the bus until its request is granted• A bus master must signal to the arbiter after finish using the bus° Bus arbitration schemes usually try to balance two factors:• Bus priority: the highest priority device should be serviced first• Fairness: Even the lowest priority device should neverbe completely locked out from the bus° Bus arbitration schemes can be divided into four broad classes:• Daisy chain arbitration• Centralized, parallel arbitration• Distributed arbitration by self-selection: each device wanting the bus places a code indicating its identity on the bus.• Distributed arbitration by collision detection: Each device just “goes for it”. Problems found after the fact.11/28/01 ©UCB Fall 2001CS152 / Kubiatowicz Lec24.8° One of the most important issues in bus design:• How is the bus reserved by a device that wishes to use it?° Chaos is avoided by a master-slave arrangement:• Only the bus master can control access to the bus:It initiates and controls all bus requests• A slave responds to read and write requests° The simplest system:• Processor is the only bus master• All bus requests must be controlled by the processor• Major drawback: the processor is involved in every transactionBusMasterBusSlaveControl: Master initiates requestsData can go either wayArbitration: Obtaining Access to the Bus11/28/01 ©UCB Fall 2001CS152 / Kubiatowicz Lec24.9The Daisy Chain Bus Arbitrations Scheme° Advantage: simple° Disadvantages:• Cannot assure fairness:A low-priority device may be locked out indefinitely• The use of the daisy chain grant signal also limits the bus speedBusArbiterDevice 1HighestPriorityDevice NLowestPriorityDevice 2Grant Grant GrantReleaseRequestwired-OR11/28/01 ©UCB Fall 2001CS152 / Kubiatowicz Lec24.10° Used in essentially all processor-memory busses and in high-speed I/O bussesBusArbiterDevice 1Device NDevice 2GrantReqCentralized Parallel Arbitration11/28/01 ©UCB Fall 2001CS152 / Kubiatowicz Lec24.11° Separate versus multiplexed address and data lines:• Address and data can be transmitted in one bus cycleif separate address and data lines are available• Cost: (a) more bus lines, (b) increased complexity° Data bus width:• By increasing the width of the data bus, transfers of multiple words require fewer bus cycles• Example: SPARCstation 20’s memory bus is 128 bit wide• Cost: more bus lines° Block transfers:• Allow the bus to transfer multiple words in back-to-back bus cycles• Only one address needs to be sent at the beginning• The bus is not released until the last word is transferred• Cost: (a) increased complexity(b) decreased response time for requestIncreasing the Bus Bandwidth11/28/01 ©UCB Fall 2001CS152 / Kubiatowicz Lec24.12° Overlapped arbitration• perform arbitration for next transaction during current transaction° Bus parking• master can holds onto bus and performs multiple transactions as long as no other master makes request° Overlapped address / data phases (prev. slide)• requires one of the above techniques° Split-phase (or packet switched) bus• completely separate address and data phases• arbitrate separately for each• address phase yield a tag which is matched with data phase° ”All of the above” in most modern busesIncreasing Transaction Rate on Multimaster Bus11/28/01 ©UCB Fall 2001CS152 / Kubiatowicz Lec24.13What is DMA (Direct Memory Access)?° Typical I/O devices must transfer large amounts of data to memory of processor:• Disk must transfer complete block • Large packets from network• Regions of frame buffer° DMA gives external device ability to access memory directly: much lower overhead than having processor request one word at a time.° Issue: Cache coherence:• What if I/O devices write data that is currently in processor Cache? - The processor may never see new data!• Solutions: - Flush cache on every I/O operation (expensive)- Have hardware invalidate cache lines (remember “Coherence” cache misses?)11/28/01 ©UCB Fall
View Full Document