DOC PREVIEW
UT Dallas CS 6350 - BigTable (1)

This preview shows page 1-2-3-18-19-37-38-39 out of 39 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 39 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 39 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 39 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 39 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 39 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 39 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 39 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 39 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 39 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Slide 1OverviewData ModelColumn FamiliesTimestampsData ModelData ModelData ModelBigtableApplication 1: Google AnalyticsApplication 1: Google Analytics (Cont…)Application 2: Google Earth & MapsApplication 3: Personalized SearchData ModelData ModelTable implementationTable implementationTable ImplementationImplementationAPIsBuilding BlocksImplementationImplementationChubbySoftware InfrastructureBigTable & GFSFinding TabletsLocation of Tablets (Ranges)Placement of TabletsPlacement of TabletsMasterClient Write & Read OperationsTablet operationsRefinement – Locality groups & CompressionWrite OperationsCompactionsMinor compactionMerging compactionMajor compactionBigTableDistributed storage for structured dataFay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Michael Burrows, Tushar Chandra, Andrew Fikes, and Robert Gruber. (2006) Bigtable: A Distributed Storage System for Structured Data. Proceedings of the 7th Symposium on Operating System Design and Implementation (OSDI 2006), pages 205-218.1http://the-paper-trail.org/blog/bigtable-googles-distributed-data-store/http://read.seas.harvard.edu/cs261/2011/bigtable.htmlhttp://www.cs.rutgers.edu/~pxk/417/notes/content/bigtable.htmlBigTableOverviewGoalsscalabilitypetabytes of datathousands of machinesapplicabilityto Google applicationsGoogle AnalyticsGoogle Earth… not a general storage modelhigh performancehigh availabilityStructureuses GFS for storageuses Chubby for coordination2Note: figure from presentation by Jeff Dean (Google)BigTableData Model3(row: string, column: string, timestamp: int64)  stringA BigTable is a sparse, distributed, persistent multidimensional sorted map. The map is indexed by a row key, a column key, and a timestamp; each value in the map is an uninterpreted array of bytesRow keysup to 64K, 10-100 bytes typicallexicographically orderedreading adjacent row ranges efficientorganized into tablets: row rangesColumn keysgrouped into column families - family:qualifiercolumn family is basis for access controlBigTableColumn FamiliesColumn keys are grouped into sets called column families.A column family must be created before data can be stored in a column key.Hundreds of static column families.Syntax is family:key, e.g., Language:English, Language:German, etc.BigTableTimestamps64 bit integersAssigned by:Bigtable: real-time in microseconds,Client application: when unique timestamps are a necessity.Items in a cell are stored in decreasing timestamp order.Application specifies how many versions (n) of data items are maintained in a cell.Bigtable garbage collects obsolete versions.BigTableData Model6(row: string, column: string, timestamp: int64)  stringTimestampsautomatically assigned (real-time) or application definedused in garbage collection (last n, n most recent, since time)Transactionsiterator-style interface for read operationatomic single-row updatesno support for multi-row updatesno general relational modelBigTableData ModelRow TimestampColumn family:animal:Column familyrepairs:animal:type animal:size repairs:costenclosure1t2 zebra 1000 EURt1 lion bigenclosure2 … … … …We may have a huge number (e.g., hundreds of thousands or millions) of columns but the column family for each row will have only a tiny fraction of them populated. While the number of column families will typically be small in a table (at most hundreds), the number of columns is unlimited.BigTableData ModelColumn family animal:(enclosure1, t2, animal:type) zebra(enclosure1, t1, animal:size) big(enclosure1, t1, animal:type) lionColumn family repairs:(enclosure1, t1, repairs:cost) 1000 EURBigTableBigtableUsed in different applications supported by Google.BigTableApplication 1: Google AnalyticsEnables webmasters to analyze traffic pattern at their web sites. Statistics such as:Number of unique visitors per day and the page views per URL per day,Percentage of users that made a purchase given that they earlier viewed a specific page.How? A small JavaScript program that the webmaster embeds in their web pages.Every time the page is visited, the program is executed.Program records the following information about each request:User identifierThe page being fetchedBigTableApplication 1: Google Analytics (Cont…)Two of the BigtablesRaw click table (~ 200 TB)A row for each end-user session.Row name include website’s name and the time at which the session was created.Clustering of sessions that visit the same web site. And a sorted chronological order.Compression factor of 6-7.Summary table (~ 20 TB)Stores predefined summaries for each web site.Generated from the raw click table by periodically scheduled MapReduce jobs.Each MapReduce job extracts recent session data from the raw click table.Row name includes website’s name and the column family is the aggregate summaries.Compression factor is 2-3.BigTableApplication 2: Google Earth & MapsFunctionality: Pan, view, and annotate satellite imagery at different resolution levels.One Bigtable stores raw imagery (~ 70 TB):Row name is a geographic segments. Names are chosen to ensure adjacent geographic segments are clustered together.Column family maintains sources of data for each segment.There are different sets of tables for serving client data, e.g., index table.BigTableApplication 3: Personalized SearchRecords user queries and clicks across Google properties.Users browse their search histories and request for personalized search results based on their historical usage patterns.One Bigtable:Row name is useridA column family is reserved for each action type, e.g., web queries, clicks.User profiles are generated using MapReduce.These profiles personalize live search results.Replicated geographically to reduce latency and increase availability.BigTableData Model14BigTableData Model15BigTableTable implementationa table is divided into a set of tablets, each storing a set of consecutive rowstablets typically 100-200MB16a...fg...k...v...za...fg...kv...ztabletablettablettabletBigTableTable implementation17g...kSSTableSSTable SSTable. . .64KBlock64KBlock64KBlock...indexSSTabletableta tablet is stored as a set of SSTablesan SSTable has a set of 64K blocks and an indexeach SSTable is a GFS fileBigTableTable


View Full Document

UT Dallas CS 6350 - BigTable (1)

Documents in this Course
HW3

HW3

5 pages

NOSQL-CAP

NOSQL-CAP

23 pages

BigTable

BigTable

39 pages

HW3

HW3

5 pages

Load more
Download BigTable (1)
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view BigTable (1) and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view BigTable (1) 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?