Big Data and Hadoop Developer Lesson 9 HBase Copyright 2014 Simplilearn All rights reserved Copyright 2014 Simplilearn All rights reserved Objectives By the end of this lesson you will be able to Explain HBase architecture Describe the HBase data model Identify the steps to install HBase Explain how to insert data and query data from HBase Copyright 2014 Simplilearn All rights reserved HBase Introduction Apache HBase is a distributed column oriented database built on top of HDFS Hadoop Distributed File System Apache HBase is an open source distributed versioned non relational database modeled after Google s Bigtable A Distributed Storage System for Structured Data Just as Bigtable leverages the distributed data storage provided by the Google File System Apache HBase provides Bigtable like capabilities on top of Hadoop and HDFS HBase is used when random real time read write access is needed for Big Data The goal of HBase is the hosting of very large tables with billions of rows and millions of columns atop clusters of commodity hardware Copyright 2014 Simplilearn All rights reserved Characteristics of HBase HBase is a type of NoSQL and is classified as a key value store In HBase value is identified with a key both key and value are a byte array values are stored in key orders values can be accessed very fast by their keys HBase is a database in which tables having no schema Column families and not columns are defined at the time of table creation Copyright 2014 Simplilearn All rights reserved Companies Using HBase Some of the companies that use HBase as their core program are Copyright 2014 Simplilearn All rights reserved HBase Architecture HBase has two types of Nodes Master and RegionServer Following are the characteristics of the two nodes RegionServer Master Only one Master node runs at a time Its high availability is maintained with ZooKeeper It manages cluster operations like assignment load balancing and splitting One or more RegionServers can exist at a time It hosts tables performs reads and buffers writes Clients communicate with RegionServers for read write operation It is not a part of the read write path A region in HBase is the subset of a table s rows The Master node detects the status of RegionServers and assigns regions to RegionServers Copyright 2014 Simplilearn All rights reserved HBase Architecture contd The image represents the components of HBase HBase Master and RegionServers Copyright 2014 Simplilearn All rights reserved Storage Model of HBase Following are the facts related to the storage model of HBase Persistence and data availability Partitioning A table is horizontally partitioned into regions each region is composed of a sequential range of keys Each region is managed by a RegionServer A RegionServer may hold multiple regions HBase stores its data in HDFS it does not replicate RegionServers and relies on HDFS replication for the availability of data Region data is cached in memory o Updates and reads are served from the in memory cache MemStore o MemStore is flushed periodically to HDFS o Write Ahead Log WAL stored in HDFS is used for the durability of updates Copyright 2014 Simplilearn All rights reserved Row Distribution of Data between RegionServers distribution of rows in structured data using HBase Data is sliced and maintained in individual RegionServers depending on the requirement of the user Logical View All rows in a table The image illustrates the Rows A1 A2 A22 A3 K4 O90 Z30 Z55 Region Null A3 Region A3 F34 Region F34 K80 Region K80 095 RegionServer Region 095 null RegionServer RegionServer Copyright 2014 Simplilearn All rights reserved Data Storage in HBase Following are the facts related to data storage in HBase Data is stored in files called HFiles StoreFiles saved in HDFS HFile is a key value map When data is added it is written to a log called Write Ahead Log and stored in memory MemStore HFiles are immutable since HDFS does not support updates to an existing file HBase periodically performs data compactions to control the number of HFiles and to keep the cluster well balanced Copyright 2014 Simplilearn All rights reserved Data Model Following are the features of the data model in HBase Tables are sorted by rows During table creation column families should be defined o Each family consists of any number of columns o Each column consists of any number of versions o Columns only exist when inserted NULLs are free o Columns in a family are sorted and stored together Everything except table names are stored as byte arrays A row value is identified by a row key a column family with columns and a timestamp with version Copyright 2014 Simplilearn All rights reserved Data Model contd Following are other features of the data model The starting identifier is a row key Column families are associated with column qualifiers Each row has a timestamp and an associated value Copyright 2014 Simplilearn All rights reserved When to Use HBase Following are the scenarios when you should use HBase You are using the concept of variable schema When to use HBase You have enough data and hundreds of millions or billions of rows You have sufficient commodity hardware with a minimum of five nodes You are using it for random selects and range scans by key You need to carefully evaluate HBase for mixed workloads Copyright 2014 Simplilearn All rights reserved HBase vs RDBMS The table shows a comparison between HBase and Relational Database Management System RDBMS HBase RDBMS Automatic partitioning Usually manual admin driven partitions Scales linearly and automatically with new nodes Usually scales vertically by adding more hardware resources Uses commodity hardware Relies on expensive servers Has fault tolerance Fault tolerance may or may not be present Leverages batch processing with MapReduce distributed processing Relies on multiple threads or processes rather than MapReduce distributed processing Copyright 2014 Simplilearn All rights reserved Installation of HBase You need to perform the following steps to install HBase 2 1 Get the download link for Hbase tar file from www hbase apache org Download HBase in your server system Add permissions Open bashrc file to include the settings Copy the extracted folder in usr local hbase Untar HBase in your server system 7 6 5 4 3 8 Add the lines shown in the bashrc file Refresh the bashrc file Copyright 2014 Simplilearn All rights reserved Installation of HBase Step 1 Get the download link for HBase tar file from the
View Full Document