ODU CS 791 - Google File System - D2318077

Home> Schools> Old Dominion University> Computer Science (CS) > CS 791> Google File System

DOC PREVIEW

ODU CS 791 - Google File System

School name Old Dominion University

Course Cs 791- Graduate Seminar

Pages 33

This preview shows page 1-2-15-16-17-32-33 out of 33 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 33 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 33 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 33 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 33 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 33 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 33 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 33 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 33 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Google File SystemOverviewIntroductionDesign OverviewGFS SemanticsGFS ArchitectureSlide 7Slide 8Slide 9Slide 10Slide 11Master stores metadata inConsistency modelSystem InteractionsSlide 15Slide 16SnapshotSlide 18Master Operation1.Namespace management and lockingSlide 21Slide 22Slide 23Slide 24Slide 25Slide 26Slide 27Slide 28Fault Tolerance And DiagnosisSlide 30Slide 31ExperiencesConclusionGoogle File System Sanjay Ghemwat, Howard Gobioff, Shun-Tak Leung Vijay Reddy Mara Radhika MalladiOverview•Introduction•Design Overview•System Interactions•Master Operations•Fault tolerance and Diagnosis•Measurements•Experiences•ConclusionIntroduction1. GFS was designed to meet demands of Google's data processing needs, stores data on Linux files.2. Component failures - system should monitor error detection, fault tolerance and automatic recovery3. Huge files - The system stores few millions of files each with 100 MB or more4. Appending new data - the new data is added to existing data5. Co -designing the applications and the file system - done to increase the flexibility.Design Overview1. Assumptions•Built from many inexpensive commodity components that often fail.•Stores a modest number of large files-each with 100 MB or more•The workloads primarily consist of two kinds of reads:i. Large streaming readsii. Small random reads•Workloads have writes similar to reads•Must efficiently implement well-defined semantics for multiple clients that concurrently append to the same file.•High bandwidth is more important than low latencyGFS Semantics•Normal semantics:Create, Delete, Read, Write, Close, Open•GFS-specific semantics:Atomic record appends, snapshotsGFS Architecture•Single Master•Multiple Chunk servers•Multiple Clients•Master maintains:•all file system metadata•access control information, mapping from files to chunks and current location of files•communication with chunk servers, gives certain instructions and maintain the hold of location of chunks•Chunk servers:•files are divided in to fixed sized chunks•each chunk is identified by 64 bit chunk handler assigned by master• stores chunks on a local disk as Linux files•Clients Communicates with master about the current lease holder and performs read and write operations from chunk server3. Architecture:5. Chunk Size•64 MB •Advantages of large chunk size•Reduces clients need to interact with master•client can makes many operations on a given chunk•Disadvantages of large chunk size with lazy space allocation:•A small file consists of a small chunk which is accessed by many clients simultaneously which in turn becomes hotspot.6. Metadata•The master stores three types of metadata •The file and chunk namespaces •The mapping from files to chunks•The locations of each chunk’s replicasMaster stores metadata in•In memory data structures: master operations becomes fast. The capacity of whole system (total number of chunks) is limited by master memory. •Chunk locations: Master does not maintain persistent record of which chunk server has a replica of a given chunk•Operation log:Contains historical record of critical metadata changesStore operation invisible to clients.Replicating operational logConsistency model•The state of the file region after data mutation depends on whether the mutation is success or failure.•A file region is consistent if all clients see the same data regardless of which replica they read from.•A region is defined if a client can see what the mutation writes its entirety•A region is undefined if a client can read the same data but may not reflect what on one mutation has writtenSystem Interactions•How the client, master, and chunk servers interact to implement data mutations•Atomic record append•Snapshot1. Leases and mutation order•Mutation – Set of operations such as writing, appending or creating.•Lease - maintains the consistent mutation across the replicas •Master grants lease to one of the replica called primary• Primary - picks up a order for all mutations •All replicas follows the same order given by primary while mutationsAtomic record appends A B C x x x y y y z z z A B C x x x y y y z z z 1 1 g offset 4 A B C x x x y y y z z z 1 1 g 1 1 1 offset 5Snapshot• Makes a copy of a file or a directory tree while minimizing any interruptions of ongoing mutation.Here the lease is with primary chunk server C.Client's MASTERSecondaryChunk serverPrimaryChunk serverSecondaryChunk serverC C’ C CC’ C’Master Operation•The master executes all namespace operations•It manages chunk replicas through the system1.Namespace management and locking•/d1/d2/d3/…/dn/leaf read lock read lock read lock write lock d1 /d1/d2/d1/d2/d3/../dn/leaf/d1/d2/d3/../dnExample:/home/user/foo – foo is the file to be created/home/user is snapshotted to /same/userSnapshotting locks: readlock writelockFile creating locks: readlock writelockhomesame home/userSame/userhome Home/userHome/user/foo2.Replica PlacementReplica placement serves 2 purposes:1. Maximize data reliability and availability2. Maximize network bandwidth utilizationReplicas should not only spread across machines but also across racks – ensures that chunk replicas will survive even though entire rack is damaged.3.Creation, Re-replication, Re-balancingChunk replicas are created for three reasons:1.Chunk creation: 1. Place new replicas on chunk servers with below-average disk space utilization.2. Limit the number of recent creations on each server.3. Spread replicas of a chunk across racks.2. Re-replications: 1. chunk server becomes unavailable2. It reports that its replica may be corrupted.3.One of its disks is disabled because of errors 4. Replication goal is increased.Chunk is re-replicated based on the priority.Master

View Full Document