Shared Memory MultiprocessorsBig picture debateContrasting perspectivesTime warp…Slide 5Bottom line here?Trends when work was doneOS Issues for multiprocessorsIdeasVirtual Machine MonitorSlide 11DISCOInterfaceImplementationMajor Data StructuresVirtual CPUVirtual Physical MemoryNUMA Memory ManagementPage MigrationSlide 20Slide 21Slide 22Virtual I/O DevicesVirtual DisksSlide 25Virtual Network InterfaceSlide 27Slide 28Slide 29Running Commodity OSSlide 31Results – Virtualization OverheadResults – Overhead breakdown of Pmake workloadResults – Memory OverheadsResults – Workload ScalabilityResults – On Real HardwareVMWare: DISCO turned into a productTornadoOO DesignSlide 40Slide 41Slide 42Slide 43Slide 44Slide 45OO Design – miss caseSlide 47Slide 48Slide 49Handling Shared Objects – Clustered ObjectClustered Object - BenefitsClustered Object example - ProcessReplication - TradeoffsClustered Object ImplementationSlide 55Slide 56Slide 57Slide 58Dynamic Memory AllocationSynchronizationGarbage CollectionProtected Procedure Call (PPC)PPC PropertiesPPC ImplementationResults - MicrobenchmarksK42Fair SharingConclusionShared Memory MultiprocessorsKen BirmanDraws extensively on slides by Ravikant DintyalaBig picture debateHow best to exploit hardware parallelism?–“Old” model: develop an operating system married to the hardware; use it to run one of the major computational science packages–“New” models: seek to offer a more transparent way of exploiting parallelismToday’s two papers offer distinct perspectives on this topicContrasting perspectivesDisco:–Here, the basic idea is to use a new VMM to make the parallel machine look like a very fast cluster–Disco runs commodity operating system on itQuestion raised–Given that interconnects are so fast, why not just buy a real cluster?–Disco: focus is on benefits of shared VMTime warp…As it turns out, Disco found a commercially important opportunity–But it wasn’t exploitation of ccNUMA machines–Disco morphed into VMWare, a major product for running Windows on Linux and vice versa–Company was ultimately sold for $550M…. Proving that research can pay off!Contrasting perspectivesTornado:–Here, assumption is that shared memory will be the big attraction to end userBut performance can be whacked by contention, false sharingWant “illusion” of sharing but hardware-sensitive implementation–They also believe that user is working in an OO paradigm (today would point to languages like Java and C#, or platforms like .net and CORBA)–Goal becomes: provide amazingly good support for shared component integration in a world of threads and objects that interact heavilyBottom line here?Key idea: clustered object–Looks like a shared object–But actually, implemented cleverly with one local object instance per thread…Tornado was interesting…–… and got some people PhD’s and tenure–… but it ultimately didn’t change the work in any noticeable wayWhy?–Is this a judgment on the work? (Very architecture-dependent)–Or a comment about the nature of “majority” OS platforms (Linux, Windows, perhaps QNX)?Trends when work was doneA period when multiprocessors were–Fairly tightly coupled, with memory coherence–Viewed as a possible cost/performance winner for server applicationsAnd cluster interconnects were still fairly slowResearch focused on several kinds of concerns:–Higher memory latencies; TLB management is critical–Large write sharing costs on many platforms–Large secondary caches needed to mask disk delays–NUMA h/w, which suffers from false sharing of cache lines–Contention for shared objects–Large system sizesOS Issues for multiprocessorsEfficient sharingScalabilityFlexibility (keep pace with new hardware innovations)ReliabilityIdeasStatically partition the machine and run multiple, independent OS’s that export a partial single-system image (Map locality and independence in the applications to their servicing - localization aware scheduling and caching/replication hiding NUMA)Partition the resources into cells that coordinate to manage the hardware resources efficiently and export a single system imageHandle resource management in a separate wrapper between the hardware and OSDesign a flexible object oriented framework that can be optimized in an incremental fashionVirtual Machine MonitorAdditional layer between hardware and operating systemProvides a hardware interface to the OS, manages the actual hardwareCan run multiple copies of the operating systemFault containment – os and hardwareVirtual Machine MonitorAdditional layer between hardware and operating systemProvides a hardware interface to the OS, manages the actual hardwareCan run multiple copies of the operating systemFault containment – os and hardwareOverhead, Uninformed resource management, Communication and sharing between virtual machines?DISCODISCOPE PE PE PE PE PE PEInterconnectccNUMA MultiprocessorOS SMP-OS OS OS Thin OSInterfaceProcessors – MIPS R10000 processor (kernel pages in unmapped segments)Physical Memory – contiguous physical address space starting at address zero (non NUMA aware)I/O Devices – virtual disks (private/shared), virtual networking (each virtual machine is assigned a distinct link level address on an internal virtual subnet managed by DISCO; communication with outside world, DISCO acts as a gateway), other devices have appropriate device driversImplementationVirtual CPUVirtual Physical MemoryVirtual I/O DevicesVirtual DisksVirtual Network InterfaceAll in 13000 lines of codeMajor Data StructuresVirtual CPUVirtual processors time-shared across the physical processors (under “data locality” constraints)Each Virtual CPU has a “process table entry” + privileged registers + TLB contentsDISCO runs in kernel mode, the host OS in supervisor mode, others run in user modeOperations that cannot be issued in supervisor mode are emulated (on trap – update the privileged registers of the virtual processor and jump to the virtual machine’s trap vector)Virtual Physical MemoryMapping from physical address (virtual machine physical) to machine address maintained in pmapProcessor TLB contains the virtual-to-machine mappingKernel pages – relink the operating system code and data into mapped region.Recent TLB history saved in a second-level
View Full Document