Author: Greg AstfalkDivision: HPSDTopic: PetaFlops-IIConferenceDate: February 18, 1999Slide:1of 25Author: Greg AstfalkDivision: HPSDTopic: PetaFlops-IIConferenceDate: February 18, 1999Slide:2of 25● My comments are predicated on the assumption that the operating system needs to be, at a minimum, “Unix-like”● The PetaOps system must fit into the existing computing universe■ system APIs■ networking■ required middleware■ remote code development■ existing code base■ etc.Author: Greg AstfalkDivision: HPSDTopic: PetaFlops-IIConferenceDate: February 18, 1999Slide:3of 25This space intentionally left blank.Author: Greg AstfalkDivision: HPSDTopic: PetaFlops-IIConferenceDate: February 18, 1999Slide:4of 25● At past PetaOps workshops I have sounded cynical or “sour-grapes”● Today is no exception● In defense, I view it as being pragmaticAuthor: Greg AstfalkDivision: HPSDTopic: PetaFlops-IIConferenceDate: February 18, 1999Slide:5of 25● If we can get the HTMT processors at 100 Ghz(say 200 Gflops)■ 5,000 processors● If we are forced to go with commodity processors at, say, 8 Gflops each■ 125,000 processorsAuthor: Greg AstfalkDivision: HPSDTopic: PetaFlops-IIConferenceDate: February 18, 1999Slide:6of 25● For latency hiding we need somewhere between 1 million and 100 million threads● Let’s stop for just a moment and think about this….Okay now lets get started againAuthor: Greg AstfalkDivision: HPSDTopic: PetaFlops-IIConferenceDate: February 18, 1999Slide:7of 25● Question:■ What is the most difficult application on the planet to parallelize?● Answer:■ UnixAuthor: Greg AstfalkDivision: HPSDTopic: PetaFlops-IIConferenceDate: February 18, 1999Slide:8of 25● Where are we today?● The largest SSI Unix system is 256 processors● We are a factor of 20–500 off the mark● From first-hand experience■ every factor of 2 increase in o/s scalability induces at least a factor of 10 of effort● Why is it that SGI and HP’s architectures support N processor ccNUMA and the o/s is at N/x, where x > 1?■ Because it’s hard to do otherwise!Author: Greg AstfalkDivision: HPSDTopic: PetaFlops-IIConferenceDate: February 18, 1999Slide:9of 25● A single processor operating system that has been stretched to handle SMPs● The fundamental structure of the Unix internals precludes it’s scalability, without complete overhaul, to thousands of processors● Two significant areas of concern■ the process manager (PM)■ the virtual memory manager (VM)● Too many critical sectionsAuthor: Greg AstfalkDivision: HPSDTopic: PetaFlops-IIConferenceDate: February 18, 1999Slide:10of 25● The amount of shared information in the internal structures■ the proc structure is especially nasty▲ it has been a catch-basin for years● The need to maintain a single-system image● Aside: the internal data structures have not changed all that much since the Thompson and Ritchie daysAuthor: Greg AstfalkDivision: HPSDTopic: PetaFlops-IIConferenceDate: February 18, 1999Slide:11of 25● Unix LOVES linked-lists■ Think about walking a linked-list of PetaOps scale and comparing against a single field in a structure● Maintaining consistency in VM is particularly troubling■ many memory levels, separate page pools, consistency between “nodes”, etc.● Data movement within the o/s■ buffer cache, for exampleAuthor: Greg AstfalkDivision: HPSDTopic: PetaFlops-IIConferenceDate: February 18, 1999Slide:12of 25● Should the architecture influence (dictate?) the operating system?● Should the operating system force architectural decisions?● This is not a rhetorical questionAuthor: Greg AstfalkDivision: HPSDTopic: PetaFlops-IIConferenceDate: February 18, 1999Slide:13of 25● Sheer component count of a PetaOps system dictates frequent failures in all types of components■ By frequent I mean minutes● The operating system must be resilient to failures of■ disks (easy :-)■ processors■ memory■ interconnects■ ASICs● This has a profound effect on the o/sAuthor: Greg AstfalkDivision: HPSDTopic: PetaFlops-IIConferenceDate: February 18, 1999Slide:14of 25● Speaking from first–hand experience doing a robust, in the face of failures, o/s is a very difficult problem● To not have this capability in the o/s for a PetaOps system is a recipe for certain failure■ failure here means a system that is always either booting or doing application start-upAuthor: Greg AstfalkDivision: HPSDTopic: PetaFlops-IIConferenceDate: February 18, 1999Slide:15of 25● A slight digression that is related to the o/s● What programming model do we want or need?■ shared–memory■ distributed–memory● More on this laterAuthor: Greg AstfalkDivision: HPSDTopic: PetaFlops-IIConferenceDate: February 18, 1999Slide:16of 25● Assume we are going to go with a single address space PetaOps system■ We are in serious trouble here● I have no genuine feelings of possible success here unless the o/s is “completely” restructured● We do have some data on which we can extrapolate■ It is not encouragingAuthor: Greg AstfalkDivision: HPSDTopic: PetaFlops-IIConferenceDate: February 18, 1999Slide:17of 25● Targeting 2007 for availability● A relatively small team could immediately begin the redesign and define the internals● Architectural simulators will be required far in advance of the actual hardware● As the specifics of the machine become available the machdep work could beginAuthor: Greg AstfalkDivision: HPSDTopic: PetaFlops-IIConferenceDate: February 18, 1999Slide:18of 25● This benign sounding approach is not trivial● Availability is still a significant issue● The scale of the operating systems’s domains are more manageable, say (500) processors■ Unix as it exists today might suffice● This implies a message–passing programming modelAuthor: Greg AstfalkDivision: HPSDTopic: PetaFlops-IIConferenceDate: February 18, 1999Slide:19of 25● If its distributed, it is MPI■ grep “MPI” with “message-passing” in what follows● Lets consider the consequences of MPI, at the PetaOPs scale, on the operating system● In what follows I am not “picking on” MPI■ it is a vehicle to point out the operating system
or
We will never post anything without your permission.
Don't have an account? Sign up