CSE 380 Computer Operating SystemsAnnouncementSystems with Multiple CPUsAdvantagesDesign IssuesClassificationMutiprocessorsMultiprocessor SystemsMultiprocessor ArchitectureBus-based UMASwitched UMACrossbar SwitchCache CoherenceConsistency and replicationExampleInvalidate vs. update protocolsSnoopy ProtocolSnoopy Cache CoherenceSlide 19Sample Scenario for SnoopySnoopy Scenario (Continued)Notions of consistencyMultiprocessor OSMaster-Slave OrganizationSymmetric Multiprocessing (SMP)SynchronizationOriginal Solution using TSLTSL solution for multi-processorsBusy-Waiting vs Process switchMultiprocessors: SummarySchedulingIssues for Multiprocessor SchedulingMulticomputersSlide 34ClustersSwitching SchemesInterprocess CommunicationMessage-based CommunicationUser-level Communication PrimitivesBlocking vs Non-blockingBuffers and CopyingThe Problem with MessagesRemote Procedure CallA brief history of RPCSlide 45Steps in Remote Procedure CallsSlide 47RPC Call StructureRPC Return StructureRPC StubsRPC Parameter MarshallingRPC failure semanticsTypes of failureHandling message failurePossible semantics to deal with crashesShared memory vs. message passingDistributed Shared Memory (DSM)Slide 58DSM Implementation IssuesDistributed Shared MemorySome Implementation DetailsCache/Memory Coherence and ConsistencyFalse sharing in DSMLoad BalancingAlgorithms for Load Balancing1CSE 380Computer Operating SystemsInstructor: Insup LeeUniversity of PennsylvaniaFall 2003Lecture Notes: Multiprocessors (updated version)2AnnouncementColloq by Dennis Ritchie“UNIX and Beyond: Themes of Operating Systems Research at Bell Labs," 4:30 pm, Wednesday, November 12Wu-Chen AuditoriumWritten Assignment will be post later today3Systems with Multiple CPUsCollection of independent CPUs (or computers) that appears to the users/applications as a single systemTechnology trendsPowerful, yet cheap, microprocessorsAdvances in communicationsPhysical limits on computing power of a single CPUExamplesNetwork of workstationsServers with multiple processorsNetwork of computers of a companyMicrocontrollers inside a car4AdvantagesData sharing: allows many users to share a common data baseResource sharing: expensive devices such as a color printerParallelism and speed-up: multiprocessor system can have more computing power than a mainframeBetter price/performance ratio than mainframesReliability: Fault-tolerance can be provided against crashes of individual machinesFlexibility: spread the workload over available machinesModular expandability: Computing power can be added in small increments (upgrading CPUs like memory)5Design IssuesTransparency: How to achieve a single-system imageHow to hide distribution of memory from applications?How to maintain consistency of data?PerformanceHow to exploit parallelism?How to reduce communication delays?Scalability: As more components (say, processors) are added, performance should not degradeCentralized schemes (e.g. broadcast messages) don’t workSecurity6ClassificationMultiprocessorsMultiple CPUs with shared memoryMemory access delays about 10 – 50 nsecMulticomputersMultiple computers, each with own CPU and memory, connected by a high-speed interconnect Tightly coupled with delays in micro-secondsDistributed SystemsLoosely coupled systems connected over Local Area Network (LAN), or even long-haul networks such as InternetDelays can be seconds, and unpredictable7Mutiprocessors8Multiprocessor SystemsMultiple CPUs with a shared memoryFrom an application’s perspective, difference with single-processor system need not be visibleVirtual memory where pages may reside in memories associated with other CPUsApplications can exploit parallelism for speed-upTopics to cover 1. Multiprocessor architectures (Section 8.1.1)2. Cache coherence3. OS organization (Section 8.1.2)4. Synchronization (Section 8.1.3)5. Scheduling (Section 8.1.4)9Multiprocessor ArchitectureUMA (Uniform Memory Access)Time to access each memory word is the sameBus-based UMA CPUs connected to memory modules through switchesNUMA (Non-uniform memory access)Memory distributed (partitioned among processors)Different access times for local and remote accesses10Bus-based UMAAll CPUs and memory module connected over a shared busTo reduce traffic, each CPU also has a cacheKey design issue: how to maintain coherency of data that appears in multiple places?Each CPU can have a local memory module also that is not shared with othersCompilers can be designed to exploit the memory structure Typically, such an architecture can support 16 or 32 CPUs as a common bus is a bottleneck (memory access not parallelized)11Switched UMAGoal: To reduce traffic on bus, provide multiple connections between CPUs and memory units so that many accesses can be concurrentCrossbar Switch: Grid with horizontal lines from CPUs and vertical lines from memory modulesCrossbar at (i,j) can connect i-th CPU with j-th memory moduleAs long as different processors are accessing different modules, all requests can be in parallelNon-blocking: waiting caused only by contention for memory, but not for busDisadvantage: Too many connections (quadratic)Many other networks: omega, counting, …12Crossbar Switch13Cache CoherenceMany processors can have locally cached copies of the same objectLevel of granularity can be an object or a block of 64 bytesWe want to maximize concurrencyIf many processors just want to read, then each one can have a local copy, and reads won’t generate any bus trafficWe want to ensure coherenceIf a processor writes a value, then all subsequent reads by other processors should return the latest valueCoherence refers to a logically consistent global ordering of reads and writes of multiple processorsModern multiprocessors support intricate schemes14Consistency and replicationNeed to replicate (cache) to improve performanceHow updates are propagated between cached replicasHow to keep them consistentHow to keep them consistency (much more complicated than sequential processor)When a processor change the vale value of its copy of a variable,•the other copies are invalidated (invalidate protocol), or•the other copies are updated (update protocol).15ExampleX = 1X = 1P1’s cacheP2’s cacheMemoryX = 116Invalidate vs. update protocolsX
View Full Document