U of I CS 525 - BigTable A System for Distributed Structured Storage - D487425

Home> Schools> University of Illinois> Computer Science (CS) > CS 525> BigTable A System for Distributed Structured Storage

U of I CS 525 - BigTable A System for Distributed Structured Storage

School name University of Illinois

Course Cs 525- Advanced Graphics Processor Programming

Pages 45

Download Save

Unformatted text preview:

BigTable A System for Distributed Structured StorageMotivation: What BigTable is?Question: Why not just use commercial DB ?Goals of BigTableData model: a big mapBuilding BlocksTablet Serving StructureWhere data is storedTabletTableSlide 11Locating TabletsTablet ServingFault tolerance and load balancingImplementation Refinements - Locality GroupsRefinements - Shared Logs for editing a tableRefinements - CompressionMicrobenchmarks - speed per serverMicrobenchmarks - scalingApplication at GoogleDiscussionEnd Thank you!Slide 23Facebook Photo StorageLong Tail IssueMotivation for HaystackEnter HaystackEnter HaystackStep-through of OperationHaystackHaystack DirectoryHaystackHaystack CacheCache Hit RateHaystackHaystack ObjectA Closer Look at the Needles…Haystack StoreOperationHaystack Index FileHaystack Index FileExperimental DataProduction DataHaystack AdvantagesDiscussion/QuestionsBigTableA System for DistributedStructured StorageYanen LiDepartment of Computer ScienceUniversity of Illinois at [email protected]/08/2011cs525 course presentation, partial materials are from the internetMotivation: What BigTable is?•BigTable is a distributed Database•A more application-friendly storage service•Data in Google–URLs:•Contents, crawl metadata, links, anchors, pagerank,…–Per-user data:•User preference settings, recent queries/search results, …–Geographic locations:•Physical entities (shops, restaurants, etc.), roads, satellite image data, user annotations, …2Question: Why not just use commercial DB ?•TeraData, Oracle, MySql with sharding•Data is large–Billions of URLs, many versions/page (~20K/version)–Hundreds of millions of users, thousands of q/sec–100TB+ of satellite image data•Many incoming requests•Scale to thousands of machines - 450,000 machines (NYTimes estimate, June 14th 2006)•No commercial system big enough–Cost is too high–Might not have made appropriate design choices–Low-level storage optimizations help improving performance 3Goals of BigTable•Need to support:–Data is highly available at any time–Very high read/write rates–Efficient scans over all or interesting subsets of data–Asynchronous and continuously updates–High Scalability4Data model: a big map•BigTable is a distributed multi-dimensional sparse map(row, column, timestamp)  cell contents•Provides lookup, insert, and delete API•Row keys and Column keys are strings•Arbitrary “columns”- Column family:qualifier - Column-oriented physical store•Does not support a relational model- No table-wide integrity constraints- No multirow transaction5Building Blocks•Scheduler (Google WorkQueue)•Google Filesystem - SSTable file format•Chubby - {lock/file/name} service - Coarse-grained locks - discover tablet server - store meta data of tablets•MapReduce6Tablet Serving StructureCluster Scheduling Masterhandles failover, monitoringGFSholds tablet data, logsLock serviceholds metadata,handles master-electionBigtable tablet serverserves dataRead/write, split tabletBigtable tablet server Bigtable tablet serverBigtable masterperforms metadata ops,load balancingBigtable cellBigtable clientBigtable clientlibraryOpen()7serves dataRead/write, split tabletserves dataRead/write, split tabletMultiple masters – Only 1 elected active master at any given point of time and others sitting to acquire master lockWhere data is storedSSTable ("Static and Sorted Table")•Immutable, sorted file of key-value string pairs•Chunks of data plus an index –Index is of block ranges, not values–triplicated across three machines in GFS Index64K block64K block64K blockSSTable8Tablet •Contains some range of rows of the table•Built out of multiple SSTables•Typical size: 100-200 MB•Tablets are stored in Tablet servers (~100 per server)Index64K block64K block64K blockSSTableIndex64K block64K block64K blockSSTableTabletStart:aardvark End:apple9Table•Multiple tablets make up the table•SSTables can be shared (pointer)•Tablets do not overlap, SSTables can overlapSSTable SSTable SSTable SSTableTabletaardvarkappleTabletapple_two_Eboat10Tablets & SplittingLarge tables broken into tablets at row boundaries“<html>…”aaa.comTABLETS“contents”ENcnn.comcnn.com/sports.html“language”Website.comZuppa.com/menu.html……“<html>…”aaa.comTABLETS“contents”ENcnn.comcnn.com/sports.html“language”Website.comZuppa.com/menu.html…Yahoo.com/kids.htmlYahoo.com/kids.html?D……11Locating TabletsApproach: 3-level hierarchical lookup scheme for tablets–Location is ip:port of relevant server, all stored in META tablets–1st level: bootstrapped from lock server, points to owner of META0–2nd level: Uses META0 data to find owner of appropriate META1 tablet–3rd level: META1 table holds locations of tablets of all other tables•META1 table itself can be split into multiple tablets12Client caches tablet locationsTablet ServingWrite buffer in memory(random-access)Append-only log on GFSSSTable on GFSSSTable on GFSSSTable on GFS(mmap)TabletWriteReadWrites - Updates committed to a commit log - Recently committed updates are stored in memory - Older updates are stored in a sequence of SSTables.Reads - form a merged view of SSTables in memory - read <key-value> pairFault tolerance and load balancing•Recovering tablet –New tablet server reads data from METADATA table.–Metadata contains list of SSTables and pointers into any commit log that may contain data for the tablet.–Server reads the indices of the SSTables in memory–Reconstructs the memtable by applying all of the updates since redo points.• Master responsible for load balancing and fault tolerance - GFS replicates data - Use Chubby to keep locks of tablet servers, restart failed servers - Master checks the status of tablet servers - Keep track of available tablet servers and unassigned tablets - if a server fails, start tablet recoveringImplementation Refinements- Locality Groups•Group column families together into an SSTable–Avoid mingling data, ie page contents and page metadata–Can keep some groups all in memory•Can compress locality groups•Bloom Filters on locality groups – avoid searching SSTable - space-efficient - can test membership of a set - False positives are possible but false negatives are not15Refinements - Shared Logs for editing a table•Mutations are logged, then applied to an in-memory version•Logfile stored in

View Full Document


School:
Email:
New Password:
Confirm Password:

U of I CS 525 - BigTable A System for Distributed Structured Storage

Sign up for free to view:

Please select your school