Operating System Support for Pipeline Parallelism

Home> Academic Documents> Operating System Support for Pipeline Parallelism

DOC PREVIEW

This preview shows page 1-2-3 out of 8 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Operating System Support for Pipeline Parallelismon Multicore ArchitecturesJohn Giacomoni and Manish VachharajaniUniversity of Colorado at BoulderAbstract.The industry wide shift to multicore architecturespresents the software development community withan opportunity to revisit fundamental programmingmodels and resource management strategies. Con-tinuing to track the historical performance gains en-abled by Moore’s law with multicores may be dif-ficult as many applications are fundamentally se-quential and not amenable to data- or task-parallelorganizations. Fortunately, an important subset ofthese applications stream data (e.g., video process-ing, network frame processing, and scientific com-puting) and can be decomposed into pipeline-parallelstructures, delivering increases proportional to thepipeline depth (2x, 3x, etc.).To realize the potential of pipeline-parallel soft-ware organizations requires reexamining some ba-sic historical assumptions in OS design, includingthe purpose of time-sharing and the nature of appli-cations. The key architectural change is that multi-core architectures make it possible to fully dedicateresources as needed without compromising existingOS services. This paper describes the minimal OSextensions necessary to support efficient pipeline-parallel applications on multicore systems with sup-porting evidence from the domain of network frameprocessing.1 IntroductionThe industry wide shift to multicore1architecturespresents the software development community witha rare opportunity to revisit fundamental program-ming models and resources management strategies.Multicore systems are now present in every class ofsystem including embedded systems, workstations,and laptops. The question that must be addressed by1We are using the the term “multicore” to refer to systems with4 to 100 processing cores [4].the systems community is how to utilize the addi-tional computational resources and what minimumOS changes are needed to maximize their potential.The obvious use is improve overall system through-put by increasing task and data parallelism. However,there exists an important set of applications that aresequential and thus cannot utilize task or data paral-lelism to achieve performance improvements.For sequential and other applications, an appeal-ing option is to utilize the resources for importantnovel programming tasks such as shadow profil-ing [16] and transient fault tolerance [21]. ShadowProfiling works by running a snapshot of a processon a separate core to perform deep instrumentationwhile the fault tolerance work runs process clonesin parallel to detect and correct transient soft errorswithout additional hardware support. By using mul-tiple cores, both systems extend the system’s func-tionality without impacting performance.The assumption in the above scenarios is that mul-ticore augment the historical per-core performanceincreases. The reality is that limitations arising frompower consumption, design complexity, and wire de-lays limit our ability to increase the computationalcapabilities of a single core. Fortunately, Moore’slaw continues to hold and it is possible to continueincreasing resources by doubling cores according tothe historic exponential growth in transistor density.Therefore it is possible to continue receiving div-idends from existing data- and task-parallel strate-gies. Sequential applications have traditionally reliedupon ever increasing processor performance to im-prove their performance.Fortunately, many sequential applications of in-terest such as video decoding, network frame pro-cessing, scientific computing while sequential in na-ture may be restructured either by hand or opti-mizing compiler to manifest innate pipeline paral-lelism. Pipelines are instantiated in software by bind-ing pipeline stages to different threads of executionand feeding data serially through the different stages.For optimal performance these stages will be simul-taneously bound to different processors.In this work we focus on efficiently supportingthose applications that are streaming in nature andcan be restructured with a pipeline-parallel structure.Pipeline-parallel structures are of interest as theycan deliver performance increases proportional to thepipeline depth; a basic three stage networking appli-cation (Section 2.2) can increase either its availablecomputation time or increase its throughput by ap-proximately three times. We know of no other tech-nique, short of a high-level redesign, that can deliverequivalent increases on sequential applications.These sequential applications exhibit a criticalproperty that makes them particularly suited to apipeline-parallel decomposition, data streams se-quentially through a well defined code path from in-put to output. This sequential flow is relatively easyto analyze and decompose into a pipeline-parallelcomponents. For example, video processing algo-rithms (e.g., mpeg) and the basic TCP/IP stack havevery well defined boundaries that can be used to re-cover pipeline stages without much effort.In situations where applications appear to be fun-damentally sequential, such as the SPECint bench-marks, recovering parallelism may not be possiblewithout detailed knowledge of the machine and athorough code analysis. Compiler techniques such asDecoupled Software Pipelining [17–19] may extractsome pipeline-parallelism yielding on average 10%performance improvements by performing fine-grainparallelization with dedicated resources.In our work on exploiting pipeline-parallelsim fornetwork frame processing and scientific computingwe found that widely deployed general purpose op-erating systems (e.g., Unix variants, Windows, andMacOS X) are not prepared to efficiently supportpipeline-parallel applications. This is because thesegeneral purpose OSes are designed to optimize over-all throughput in a resource constrained (i.e., over-subscribed) environment while maintaining accept-able interactive behavior. This behavior historicallymade sense in the era of the Computer Utility, firstproposed by John McCarthy, and later with personalcomputers where the number of tasks to be serviceddwarfed the available computational resources.1.1 ClaimsMulticore architectures alter the landscape by pro-viding sufficient resources to handle backgroundtasks while dedicating resources for performance-critical tasks. This is the key observation upon whichour work is based.We suggest that when dealing with multicore ar-chitectures,


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3 out of 8 pages.

Please select your school