Unformatted text preview:

CSE 380 Computer Operating SystemsAnnouncementSystems with Multiple CPUsAdvantagesDesign IssuesClassificationMutiprocessorsMultiprocessor SystemsMultiprocessor ArchitectureBus-based UMASwitched UMACrossbar SwitchCache CoherenceConsistency and replicationExampleInvalidate vs. update protocolsSnoopy ProtocolSnoopy Cache CoherenceSlide 19Sample Scenario for SnoopySnoopy Scenario (Continued)Notions of consistencyMultiprocessor OSMaster-Slave OrganizationSymmetric Multiprocessing (SMP)SynchronizationOriginal Solution using TSLTSL solution for multi-processorsBusy-Waiting vs Process switchMultiprocessors: SummarySchedulingIssues for Multiprocessor SchedulingMulticomputersSlide 34ClustersSwitching SchemesInterprocess CommunicationMessage-based CommunicationUser-level Communication PrimitivesBlocking vs Non-blockingBuffers and CopyingThe Problem with MessagesRemote Procedure CallA brief history of RPCSlide 45Steps in Remote Procedure CallsSlide 47RPC Call StructureRPC Return StructureRPC StubsRPC Parameter MarshallingRPC failure semanticsTypes of failureHandling message failurePossible semantics to deal with crashesShared memory vs. message passingDistributed Shared Memory (DSM)Slide 58DSM Implementation IssuesDistributed Shared MemorySome Implementation DetailsCache/Memory Coherence and ConsistencyFalse sharing in DSMLoad BalancingAlgorithms for Load Balancing1CSE 380Computer Operating SystemsInstructor: Insup LeeUniversity of PennsylvaniaFall 2003Lecture Notes: Multiprocessors (updated version)2AnnouncementColloq by Dennis Ritchie“UNIX and Beyond: Themes of Operating Systems Research at Bell Labs," 4:30 pm, Wednesday, November 12Wu-Chen AuditoriumWritten Assignment will be post later today3Systems with Multiple CPUsCollection of independent CPUs (or computers) that appears to the users/applications as a single systemTechnology trendsPowerful, yet cheap, microprocessorsAdvances in communicationsPhysical limits on computing power of a single CPUExamplesNetwork of workstationsServers with multiple processorsNetwork of computers of a companyMicrocontrollers inside a car4AdvantagesData sharing: allows many users to share a common data baseResource sharing: expensive devices such as a color printerParallelism and speed-up: multiprocessor system can have more computing power than a mainframeBetter price/performance ratio than mainframesReliability: Fault-tolerance can be provided against crashes of individual machinesFlexibility: spread the workload over available machinesModular expandability: Computing power can be added in small increments (upgrading CPUs like memory)5Design IssuesTransparency: How to achieve a single-system imageHow to hide distribution of memory from applications?How to maintain consistency of data?PerformanceHow to exploit parallelism?How to reduce communication delays?Scalability: As more components (say, processors) are added, performance should not degradeCentralized schemes (e.g. broadcast messages) don’t workSecurity6ClassificationMultiprocessorsMultiple CPUs with shared memoryMemory access delays about 10 – 50 nsecMulticomputersMultiple computers, each with own CPU and memory, connected by a high-speed interconnect Tightly coupled with delays in micro-secondsDistributed SystemsLoosely coupled systems connected over Local Area Network (LAN), or even long-haul networks such as InternetDelays can be seconds, and unpredictable7Mutiprocessors8Multiprocessor SystemsMultiple CPUs with a shared memoryFrom an application’s perspective, difference with single-processor system need not be visibleVirtual memory where pages may reside in memories associated with other CPUsApplications can exploit parallelism for speed-upTopics to cover 1. Multiprocessor architectures (Section 8.1.1)2. Cache coherence3. OS organization (Section 8.1.2)4. Synchronization (Section 8.1.3)5. Scheduling (Section 8.1.4)9Multiprocessor ArchitectureUMA (Uniform Memory Access)Time to access each memory word is the sameBus-based UMA CPUs connected to memory modules through switchesNUMA (Non-uniform memory access)Memory distributed (partitioned among processors)Different access times for local and remote accesses10Bus-based UMAAll CPUs and memory module connected over a shared busTo reduce traffic, each CPU also has a cacheKey design issue: how to maintain coherency of data that appears in multiple places?Each CPU can have a local memory module also that is not shared with othersCompilers can be designed to exploit the memory structure Typically, such an architecture can support 16 or 32 CPUs as a common bus is a bottleneck (memory access not parallelized)11Switched UMAGoal: To reduce traffic on bus, provide multiple connections between CPUs and memory units so that many accesses can be concurrentCrossbar Switch: Grid with horizontal lines from CPUs and vertical lines from memory modulesCrossbar at (i,j) can connect i-th CPU with j-th memory moduleAs long as different processors are accessing different modules, all requests can be in parallelNon-blocking: waiting caused only by contention for memory, but not for busDisadvantage: Too many connections (quadratic)Many other networks: omega, counting, …12Crossbar Switch13Cache CoherenceMany processors can have locally cached copies of the same objectLevel of granularity can be an object or a block of 64 bytesWe want to maximize concurrencyIf many processors just want to read, then each one can have a local copy, and reads won’t generate any bus trafficWe want to ensure coherenceIf a processor writes a value, then all subsequent reads by other processors should return the latest valueCoherence refers to a logically consistent global ordering of reads and writes of multiple processorsModern multiprocessors support intricate schemes14Consistency and replicationNeed to replicate (cache) to improve performanceHow updates are propagated between cached replicasHow to keep them consistentHow to keep them consistency (much more complicated than sequential processor)When a processor change the vale value of its copy of a variable,•the other copies are invalidated (invalidate protocol), or•the other copies are updated (update protocol).15ExampleX = 1X = 1P1’s cacheP2’s cacheMemoryX = 116Invalidate vs. update protocolsX


View Full Document

Penn CIS 380 - Multiprocessors

Download Multiprocessors
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Multiprocessors and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Multiprocessors 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?