Penn CIS 380 - Multiprocessors lecture notes

Unformatted text preview:

1CSE 380Computer Operating SystemsInstructor: Insup LeeUniversity of PennsylvaniaFall 2003Lecture Notes: Multiprocessors (updated version)2Announcementq Colloq by Dennis Ritchieß “UNIX and Beyond: Themes of Operating SystemsResearch at Bell Labs,"ß 4:30 pm, Wednesday, November 12ß Wu-Chen Auditoriumq Written Assignment will be post later today3Systems with Multiple CPUsq Collection of independent CPUs (or computers) thatappears to the users/applications as a single systemq Technology trendsß Powerful, yet cheap, microprocessorsß Advances in communicationsß Physical limits on computing power of a single CPUq Examplesß Network of workstationsß Servers with multiple processorsß Network of computers of a companyß Microcontrollers inside a car4Advantagesq Data sharing: allows many users to share a common databaseq Resource sharing: expensive devices such as a colorprinterq Parallelism and speed-up: multiprocessor system can havemore computing power than a mainframeq Better price/performance ratio than mainframesq Reliability: Fault-tolerance can be provided against crashesof individual machinesq Flexibility: spread the workload over available machinesq Modular expandability: Computing power can be added insmall increments (upgrading CPUs like memory)5Design Issuesq Transparency: How to achieve a single-system imageß How to hide distribution of memory from applications?ß How to maintain consistency of data?q Performanceß How to exploit parallelism?ß How to reduce communication delays?q Scalability: As more components (say, processors) areadded, performance should not degradeß Centralized schemes (e.g. broadcast messages) don’t workq Security6Classificationq Multiprocessorsß Multiple CPUs with shared memoryß Memory access delays about 10 – 50 nsecq Multicomputersß Multiple computers, each with own CPU and memory, connected by a high-speed interconnectß Tightly coupled with delays in micro-secondsq Distributed Systemsß Loosely coupled systems connected over Local Area Network (LAN), or evenlong-haul networks such as Internetß Delays can be seconds, and unpredictable7Mutiprocessors8Multiprocessor Systemsq Multiple CPUs with a shared memoryq From an application’s perspective, difference with single-processor system need not be visibleß Virtual memory where pages may reside in memoriesassociated with other CPUsß Applications can exploit parallelism for speed-upq Topics to cover1. Multiprocessor architectures (Section 8.1.1)2. Cache coherence3. OS organization (Section 8.1.2)4. Synchronization (Section 8.1.3)5. Scheduling (Section 8.1.4)9Multiprocessor Architectureq UMA (Uniform Memory Access)ß Time to access each memory word is the sameß Bus-based UMAß CPUs connected to memory modules through switchesq NUMA (Non-uniform memory access)ß Memory distributed (partitioned among processors)ß Different access times for local and remote accesses10Bus-based UMAq All CPUs and memory module connected over a sharedbusq To reduce traffic, each CPU also has a cacheq Key design issue: how to maintain coherency of data thatappears in multiple places?q Each CPU can have a local memory module also that isnot shared with othersq Compilers can be designed to exploit the memory structureq Typically, such an architecture can support 16 or 32 CPUsas a common bus is a bottleneck (memory access notparallelized)11Switched UMAq Goal: To reduce traffic on bus, provide multipleconnections between CPUs and memory units so thatmany accesses can be concurrentq Crossbar Switch: Grid with horizontal lines from CPUs andvertical lines from memory modulesq Crossbar at (i,j) can connect i-th CPU with j-th memorymoduleq As long as different processors are accessing differentmodules, all requests can be in parallelq Non-blocking: waiting caused only by contention formemory, but not for busq Disadvantage: Too many connections (quadratic)q Many other networks: omega, counting, …12Crossbar Switch13Cache Coherenceq Many processors can have locally cached copies of thesame objectß Level of granularity can be an object or a block of 64 bytesq We want to maximize concurrencyß If many processors just want to read, then each one can have alocal copy, and reads won’t generate any bus trafficq We want to ensure coherenceß If a processor writes a value, then all subsequent reads byother processors should return the latest valueq Coherence refers to a logically consistent global ordering ofreads and writes of multiple processorsq Modern multiprocessors support intricate schemes14Consistency and replicationq Need to replicate (cache) to improve performanceß How updates are propagated between cached replicasß How to keep them consistentq How to keep them consistency (much morecomplicated than sequential processor)ß When a processor change the vale value of its copy of avariable,• the other copies are invalidated (invalidate protocol), or• the other copies are updated (update protocol).15ExampleX = 1X = 1P1’s cacheP2’s cacheMemoryX = 116Invalidate vs. update protocolsX = 3X = 1P1’s cacheP2’s cacheMemoryX = 1X = 3X = 3P1’s cacheP2’s cacheMemoryX = 317Snoopy Protocolq Each processor, for every cached object, keeps a state that can beInvalid, Exclusive or Read-onlyq Goal: If one has Exclusive copy then all others must be Invalidq Each processor issues three types of messages on busß Read-request (RR), Write-request (WR), and Value-response (VR)ß Each message identifies object, and VR has a tagged valueq Assumption:ß If there is contention for bus then only one succeedsß No split transactions (RR will have a response by VR)q Protocol is called Snoopy, because everyone is listening to the bus all thetime, and updates state in response to messages RR and WRq Each cache controller responds to 4 types of eventsß Read or write operation issued by its processorß Messages (RR, WR, or VR) observed on the busq Caution: This is a simplified version18Snoopy Cache CoherenceProcessor 1Cache ControllerProcessor NRead(x), Write(x,u)RR(x), WR(x), VR(x,u)x vExclusiveID ValState19Snoopy Protocolq If state is Read-onlyß Read operation: return local valueß Write operation: Broadcast WR message on bus, update state to Exclusive,and update local valueß WR message on bus: update state to Invalidß RR message on bus: broadcast VR(v) on busq If state is Exclusiveß Read operation: return local valueß Write operation: update local valueß RR message on bus: Broadcast VR(v), and change state to


View Full Document

Penn CIS 380 - Multiprocessors lecture notes

Download Multiprocessors lecture notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Multiprocessors lecture notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Multiprocessors lecture notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?