DOC PREVIEW
UCF COT 4810 - The Google File System

This preview shows page 1-2-3-4-5 out of 15 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

The Google File SystemSanjay Ghemawat, Howard Gobioff, and Shun-Tak LeungGoogle∗ABSTRACTWe have designed and implemented the Google File Sys-tem, a scalable distributed file system for large distributeddata-intensive applications. It provides fault tolerance whilerunning on inexpensive commodity hardware, and it delivershigh aggregate performance to a large number of clients.While sharing many of the same goals as previous dis-tributed file systems, our design has been driven by obser-vations of our application workloads and technological envi-ronment, both current and anticipated, that reflect a markeddeparture from some earlier file system assumptions. Thishas led us to reexamine traditional choices and explore rad-ically different design points.The file system has successfully met our storage needs.It is widely deployed within Google as the storage platformfor the generation and processing of data used by our ser-vice as well as research and development efforts that requirelarge data sets. The largest cluster to date provides hun-dreds of terabytes of storage across thousands of disks onover a thousand machines, and it is concurrently accessedby hundreds of clients.In this paper, we present file system interface extensionsdesigned to support distributed applications, discuss manyaspects of our design, and report measurements from bothmicro-benchmarks and real world use.Categories and Subject DescriptorsD[4]: 3—Distributed file systemsGeneral TermsDesign, reliability, performance, measurementKeywordsFault tolerance, scalability, data storage, clustered storage∗The authors can be reached at the following addresses:{sanjay,hgobioff,shuntak}@google.com.Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.SOSP’03, October 19–22, 2003, Bolton Landing, New York, USA.Copyright 2003 ACM 1-58113-757-5/03/0010 ...$5.00.1. INTRODUCTIONWe have designed and implemented the Google File Sys-tem (GFS) to meet the rapidly growing demands of Google’sdata processing needs. GFS shares many of the same goalsas previous distributed file systems such as performance,scalability, reliability, and availability. However, its designhas been driven by key observations of our application work-loads and technological environment, both current and an-ticipated, that reflect a marked departure from some earlierfile system design assumptions. We have reexamined tradi-tional choices and explored radically different points in thedesign space.First, component failures are the norm rather than theexception. The file system consists of hundreds or eventhousands of storage machines built from inexpensive com-modity parts and is accessed by a comparable number ofclient machines. The quantity and quality of the compo-nents virtually guarantee that some are not functional atany given time and some will not recover from their cur-rent failures. We have seen problems caused by applicationbugs, operating system bugs, human errors, and the failuresof disks, memory, connectors, networking, and power sup-plies. Therefore, constant monitoring, error detection, faulttolerance, and automatic recovery must be integral to thesystem.Second, files are huge by traditional standards. Multi-GBfiles are common. Each file typically contains many applica-tion objects such as web documents. When we are regularlyworking with fast growing data sets of many TBs comprisingbillions of objects, it is unwieldy to manage billions of ap-proximately KB-sized files even when the file system couldsupport it. As a result, design assumptions and parameterssuch as I/O operation and block sizes have to be revisited.Third, most files are mutated by appending new datarather than overwriting existing data. Random writes withina file are practically non-existent. Once written, the filesare only read, and often only sequentially. A variety ofdata share these characteristics. Some may constitute largerepositories that data analysis programs scan through. Somemay be data streams continuously generated by running ap-plications. Some may be archival data. Some may be in-termediate results produced on one machine and processedon another, whether simultaneously or later in time. Giventhis access pattern on huge files, appending becomes the fo-cus of performance optimization and atomicity guarantees,while caching data blocks in the client loses its appeal.Fourth, co-designing the applications and the file systemAPI benefits the overall system by increasing our flexibility.For example, we have relaxed GFS’s consistency model tovastly simplify the file system without imposing an onerousburden on the applications. We have also introduced anatomic append operation so that multiple clients can appendconcurrently to a file without extra synchronization betweenthem. These will be discussed in more details later in thepaper.Multiple GFS clusters are currently deployed for differentpurposes. The largest ones have over 1000 storage nodes,over 300 TB of disk storage, and are heavily accessed byhundreds of clients on distinct machines on a continuousbasis.2. DESIGN OVERVIEW2.1 AssumptionsIn designing a file system for our needs, we have beenguided by assumptions that offer both challenges and op-portunities. We alluded to some key observations earlierand now lay out our assumptions in more details.• The system is built from many inexpensive commoditycomponents that often fail. It must constantly monitoritself and detect, tolerate, and recover promptly fromcomponent failures on a routine basis.• The system stores a modest number of large files. Weexpect a few million files, each typically 100 MB orlarger in size. Multi-GB files are the common caseand should be managed efficiently. Small files must besupported, but we need not optimize for them.• The workloads primarily consist of two kinds of reads:large streaming reads and small random reads. Inlarge streaming reads, individual operations typicallyread hundreds of KBs, more commonly 1 MB or more.Successive operations from the same client often readthrough a contiguous region of a file. A small ran-dom read typically reads a


View Full Document

UCF COT 4810 - The Google File System

Documents in this Course
Spoofing

Spoofing

25 pages

CAPTCHA

CAPTCHA

18 pages

Load more
Download The Google File System
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view The Google File System and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view The Google File System 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?