Berkeley COMPSCI 258 - Virtual Stream - D2108716

Home> Schools> University of California, Berkeley> Computer Science (COMPSCI) > COMPSCI 258> Virtual Stream

DOC PREVIEW

Berkeley COMPSCI 258 - Virtual Stream

School name University of California, Berkeley

Course Compsci 258- Parallel Processors

Pages 13

This preview shows page 1-2-3-4 out of 13 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 13 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 13 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 13 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 13 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 13 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

1Virtual StreamsPerformance-Robust Parallel I/OZ. Morley Mao and Noah Treuhaft{zmao, noah}@cs.berkeley.eduComputer Science DivisionUniversity of California at BerkeleyAbstractWe present Virtual Streams as a solution to performance heterogeneity for I/O-bound applications in acluster environment. We describe our improvements to the current version of Graduated Declustering, animplementation of Virtual Streams, by exploring a new distributed load-balancing algorithm – response-rate based algorithm that reduces seek overhead, especially when there is no performance heterogeneity.This algorithm not only addresses performance difference among data producers, but also hides thevarying rate of progress of data consumers. The system using this new algorithm is shown to be stable andto converge to its final bandwidth value within reasonably amount of time. A new mechanism, serverrequest handoff, is developed to address perturbations occurring when there are many outstandingrequests at the server. The system is extended with the ability to handle writes to provide both performanceand data availability. Finally, a usable abstraction of our software infrastructure is developed and studiedfor the convenience of application programmers developing streaming I/O applications in a clusterenvironment.1. IntroductionClusters are an attractive architecture for I/O intensive applications for many reasons. The commoditycomponents that compose a cluster offer excellent cost-to-performance ratios. Furthermore, the ability tostart with a modest system and gradually add new components (incremental scalability) affords dualbenefits: cluster growth can track application growth closely, and new components can track technologyimprovements. Most importantly, clusters offer excellent I/O parallelism and scalability. Availablenetwork links, I/O buses, and disk drives increase linearly with the number of cluster nodes, ensuring thatI/O throughput scales well.Unfortunately, clusters exhibit performance heterogeneity, particularly within their I/O systems.Incremental scaling leads to a mix of disks and I/O buses with varying performance characteristics. Insidethe disks themselves, bad-block remapping and non-sequential file layouts cause unexpected seeks, whilemulti-zone behavior significantly reduces bandwidth for inner-track files. Finally, at the software level,operating system services and other applications compete for resources, rendering performance of thoseresources unpredictable.Performance heterogeneity has a significant implication for I/O-intensive programs. In the simplest case,design of these programs begins with the partitioning of data across cluster disks. A parallel programaccessing this data can then be structured with one process per disk. We use the term physical stream todescribe this one-to-one relationship between processes and disks, emphasizing the access request stream’sdirect dependence on a single device. Clearly, the behavior of physical streams is subject to the vagaries ofperformance heterogeneity. When the throughput of a physical stream slows, the I/O-bound processaccessing it will slow down, and so in turn will the parallel program to which that process belongs. As aresult of this global slowdown, other physical streams will be underutilized.To achieve consistently high performance, I/O-intensive parallel programs must take advantage of allavailable cluster I/O throughput at all times. We propose that these programs will be well served by anabstraction over physical streams that yields all available I/O throughput. We call this abstraction VirtualStreams. Virtual Streams decouple parallel program performance from that of the underlying physicalstreams by evenly distributing I/O throughput to all processes. Over the course of the program’s lifetime,2Virtual Streams deliver to each process an equal share of the aggregate disk throughput, equalizing thoseshares frequently to ensure that processes see similar throughputs at all times. Running atop such a layer,I/O-bound programs will continually receive the entire available throughput of the cluster I/O system.Because of their adaptivity, Virtual Streams offer precisely what I/O-intensive parallel programs need toachieve near-peak common case performance on clusters.From this perspective, Graduated Declustering (introduced as part of the River data-flow programmingenvironment and I/O substrate for clusters of computers [1]) can be viewed as the basis for animplementation of Virtual Streams. In this work, we begin to investigate the Graduated Declusteringimplementation space. We also explore the additional pieces necessary to complete a full implementationof Virtual Streams in a realistic cluster environment.2. Design RationaleThe design of the system strives to achieve a number of important goals, which we believe are essential inan adaptive parallel I/O system.2.1 Performance AdaptivenessThe system should quickly adapt to the arrival of any performance perturbations after they reach a certainthreshold. This threshold, also known as the low watermark is a benefit vs. cost measure. It is determinedempirically and depends on the overhead of invoking the dynamic load balancing mechanism. Loadbalancing is done only when the perturbation is bad enough to guarantee that the benefit can offset the cost.Another related criterion for adaptiveness is the speed at which the system can adjust to the departure ofany existing performance perturbations. Ideally the system should utilize the total available bandwidth andevenly distribute available disk throughput among all parallel processes at any given time.However, there exists an inherent limitation in the bandwidth equalization. The maximally achievablebandwidth is determined by the sum of the bandwidth of all replicas for every replica. If each of thesesums is at least as large as the total bandwidth of the disks divided by the number of consumers, then it ispossible for each client to achieve the same bandwidth. This can be shown more clearly by the followingequations:(The second equation should hold for all files. Bk is the sum of bandwidth for all disks that has the replicaof this particular replica k.)2.2 ScalabilityThe system should scale

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3-4 out of 13 pages.

Berkeley COMPSCI 258 - Virtual Stream

Sign up for free to view:

Please select your school