BigTable Distributed storage for structured data Fay Chang Jeffrey Dean Sanjay Ghemawat Wilson C Hsieh Deborah A Wallach Michael Burrows Tushar Chandra Andrew Fikes and Robert Gruber 2006 Bigtable A Distributed Storage System for Structured Data Proceedings of the 7th Symposium on Operating System Design and Implementation OSDI 2006 pages 205 218 http the paper trail org blog bigtable googles distributed data sto re http read seas harvard edu cs261 2011 bigtable html http www cs rutgers edu pxk 417 notes content bigtable html 1 BigTable Overview Goals scalability petabytes of data thousands of machines applicability to Google applications Google Analytics Google Earth not a general storage model high performance high availability Structure uses GFS for storage uses Chubby for coordination Note figure from presentation by Jeff Dean Google 2 BigTable Data Model row string column string timestamp int64 string A BigTable is a sparse distributed persistent multidimensional sorted map The map is indexed by a row key a column key and a timestamp each value in the map is an uninterpreted array of bytes Row keys up to 64K 10 100 bytes typical lexicographically ordered reading adjacent row ranges efficient organized into tablets row ranges Column keys grouped into column families family qualifier column family is basis for access control 3 BigTable Column Families Column keys are grouped into sets called column families A column family must be created before data can be stored in a column key Hundreds of static column families Syntax is family key e g Language English Language German etc BigTable Timestamps 64 bit integers Assigned by Bigtable real time in microseconds Client application when unique timestamps are a necessity Items in a cell are stored in decreasing timestamp order Application specifies how many versions n of data items are maintained in a cell Bigtable garbage collects obsolete versions BigTable Data Model row string column string timestamp int64 string Timestamps automatically assigned real time or application defined used in garbage collection last n n most recent since time Transactions iterator style interface for read operation atomic single row updates no support for multi row updates no general relational model 6 BigTable Data Model Row Timestamp Column family animal animal type enclosure1 enclosure2 animal size t2 zebra t1 lion big Column family repairs repairs cost 1000 EUR We may have a huge number e g hundreds of thousands or millions of columns but the column family for each row will have only a tiny fraction of them populated While the number of column families will typically be small in a table at most hundreds the number of columns is unlimited BigTable Data Model Column family animal enclosure1 t2 animal type zebra enclosure1 t1 animal size big enclosure1 t1 animal type lion Column family repairs enclosure1 t1 repairs cost 1000 EUR BigTable Bigtable Used in different applications supported by Google BigTable Application 1 Google Analytics Enables webmasters to analyze traffic pattern at their web sites Statistics such as Number of unique visitors per day and the page views per URL per day Percentage of users that made a purchase given that they earlier viewed a specific page How A small JavaScript program that the webmaster embeds in their web pages Every time the page is visited the program is executed Program records the following information about each request User identifier The page being fetched BigTable Application 1 Google Analytics Cont Two of the Bigtables Raw click table 200 TB A row for each end user session Row name include website s name and the time at which the session was created Clustering of sessions that visit the same web site And a sorted chronological order Compression factor of 6 7 Summary table 20 TB Stores predefined summaries for each web site Generated from the raw click table by periodically scheduled MapReduce jobs Each MapReduce job extracts recent session data from the raw click table Row name includes website s name and the column family is the aggregate summaries Compression factor is 2 3 BigTable Application 2 Google Earth Maps Functionality Pan view and annotate satellite imagery at different resolution levels One Bigtable stores raw imagery 70 TB Row name is a geographic segments Names are chosen to ensure adjacent geographic segments are clustered together Column family maintains sources of data for each segment There are different sets of tables for serving client data e g index table BigTable Application 3 Personalized Search Records user queries and clicks across Google properties Users browse their search histories and request for personalized search results based on their historical usage patterns One Bigtable Row name is userid A column family is reserved for each action type e g web queries clicks User profiles are generated using MapReduce These profiles personalize live search results Replicated geographically to reduce latency and increase availability BigTable Data Model 14 BigTable Data Model 15 BigTable Table implementation a f g k v z table a f tablet g k tablet v z tablet a table is divided into a set of tablets each storing a set of consecutive rows tablets typically 100 200MB 16 BigTable Table implementation g k tablet SSTable SSTable SSTable SSTable 64K Block 64K Block 64K Block index a tablet is stored as a set of SSTables an SSTable has a set of 64K blocks and an index each SSTable is a GFS file 17 BigTable Table Implementation 18 BigTable Implementation 19 BigTable APIs Metadata operations Writes Create delete tables column families change metadata Set write cells in a row DeleteCells delete cells in a row DeleteRow delete all cells in a row Reads Scanner read arbitrary cells in a bigtable Each row read is atomic Can restrict returned rows to a particular range Can ask for just data from 1 row all rows etc Can ask for all columns just certain column families or specific columns 01 14 2019 20 BigTable Building Blocks Google File System GFS stores persistent data SSTable file format Scheduler schedules jobs onto machines Chubby Lock service distributed lock manager master election location bootstrapping MapReduce optional Data processing Read write Bigtable 01 14 2019 data 21 BigTable Implementation Single master distributed system Three major components Library that linked into every client One master server Assigning tablets to tablet servers Detecting addition and expiration of tablet servers
View Full Document