DOC PREVIEW
UT Dallas CS 6350 - BigDataHadoop_PPT_Lesson09

This preview shows page 1-2-3-24-25-26-27-48-49-50 out of 50 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 50 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 50 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 50 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 50 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 50 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 50 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 50 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 50 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 50 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 50 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 50 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Copyright 2014, Simplilearn, All rights reserved. Copyright 2014, Simplilearn, All rights reserved. Lesson 9—HBase Big Data and Hadoop DeveloperCopyright 2014, Simplilearn, All rights reserved. ● Explain HBase architecture ● Describe the HBase data model ● Identify the steps to install HBase ● Explain how to insert data and query data from HBase By the end of this lesson, you will be able to: ObjectivesCopyright 2014, Simplilearn, All rights reserved. Apache HBase is a distributed column-oriented database built on top of HDFS (Hadoop Distributed File System). Just as Bigtable leverages the distributed data storage provided by the Google File System, Apache HBase provides Bigtable-like capabilities on top of Hadoop and HDFS. HBase is used when random, real-time read/write access is needed for Big Data. HBase—Introduction Apache HBase is an open-source, distributed, versioned, non-relational database modeled after Google's Bigtable: A Distributed Storage System for Structured Data. ! The goal of HBase is the hosting of very large tables with billions of rows and millions of columns, atop clusters of commodity hardware.Copyright 2014, Simplilearn, All rights reserved. HBase is a type of NoSQL and is classified as a key value store. In HBase: ● value is identified with a key. ● both key and value are a byte-array. ● values are stored in key-orders. ● values can be accessed very fast by their keys. Characteristics of HBase ! HBase is a database in which tables having no schema. Column families and not columns are defined at the time of table creation.Copyright 2014, Simplilearn, All rights reserved. Some of the companies that use HBase as their core program are: Companies Using HBaseCopyright 2014, Simplilearn, All rights reserved. HBase has two types of Nodes—Master and RegionServer. Following are the characteristics of the two nodes. HBase Architecture ● Only one Master node runs at a time. Its high availability is maintained with ZooKeeper. ● It manages cluster operations like assignment, load balancing, and splitting. ● It is not a part of the read/write path. ● One or more RegionServers can exist at a time. ● It hosts tables, performs reads, and buffers writes. ● Clients communicate with RegionServers for read/write operation. RegionServer Master ! A region in HBase is the subset of a table’s rows. The Master node detects the status of RegionServers and assigns regions to RegionServers.Copyright 2014, Simplilearn, All rights reserved. The image represents the components of HBase—HBase Master and RegionServers. HBase Architecture (contd.)Copyright 2014, Simplilearn, All rights reserved. Following are the facts related to the storage model of HBase: Storage Model of HBase ● A table is horizontally partitioned into regions; each region is composed of a sequential range of keys. ● Each region is managed by a RegionServer. ● A RegionServer may hold multiple regions. ● HBase stores its data in HDFS; it does not replicate RegionServers, and relies on HDFS replication for the availability of data. ● Region data is cached in memory. o Updates and reads are served from the in-memory cache (MemStore). o MemStore is flushed periodically to HDFS. o Write Ahead Log (WAL), stored in HDFS, is used for the durability of updates. Persistence and data availability PartitioningCopyright 2014, Simplilearn, All rights reserved. Region Null->A3 Region K80->095 Region A3->F34 Region 095->null Region F34->K80 A1 A2 A22 A3 … … K4 … … O90 … … … Z30 Z55 Rows RegionServer RegionServer RegionServer Logical View- All rows in a table The image illustrates the distribution of rows in structured data using HBase. Data is sliced and maintained in individual RegionServers, depending on the requirement of the user. Row Distribution of Data between RegionServersCopyright 2014, Simplilearn, All rights reserved. Following are the facts related to data storage in HBase: Data Storage in HBase Data is stored in files called HFiles/StoreFiles saved in HDFS. HFile is a key-value map. When data is added, it is written to a log called Write Ahead Log and stored in memory (MemStore). HFiles are immutable since HDFS does not support updates to an existing file. HBase periodically performs data compactions to control the number of HFiles and to keep the cluster well-balanced.Copyright 2014, Simplilearn, All rights reserved. Following are the features of the data model in HBase: ● Tables are sorted by rows. ● During table creation, column families should be defined: o Each family consists of any number of columns. o Each column consists of any number of versions. o Columns only exist when inserted, NULLs are free. o Columns in a family are sorted and stored together. ● Everything except table names are stored as byte arrays. Data Model ! A row value is identified by a row key, a column family with columns, and a timestamp with version.Copyright 2014, Simplilearn, All rights reserved. Following are other features of the data model: ● The starting identifier is a row key. ● Column families are associated with column qualifiers. ● Each row has a timestamp and an associated value. Data Model (contd.)Copyright 2014, Simplilearn, All rights reserved. Following are the scenarios when you should use HBase: When to Use HBase You have enough data and hundreds of millions or billions of rows. You have sufficient commodity hardware with a minimum of five nodes. You need to carefully evaluate HBase for mixed workloads. You are using it for random selects and range scans by key. You are using the concept of variable schema. When to use HBaseCopyright 2014, Simplilearn, All rights reserved. The table shows a comparison between HBase and Relational Database Management System (RDBMS): HBase vs. RDBMS HBase RDBMS Automatic partitioning Usually manual, admin-driven partitions Scales linearly and automatically with new nodes Usually scales vertically by adding more hardware resources Uses commodity hardware Relies on expensive servers Has fault tolerance Fault tolerance may or may not be present Leverages batch processing with MapReduce distributed processing Relies on multiple threads or processes rather than MapReduce distributed processingCopyright 2014, Simplilearn, All rights reserved. You need to perform the following steps to install HBase: Installation of HBase Get the download link for Hbase tar


View Full Document

UT Dallas CS 6350 - BigDataHadoop_PPT_Lesson09

Documents in this Course
HW3

HW3

5 pages

NOSQL-CAP

NOSQL-CAP

23 pages

BigTable

BigTable

39 pages

HW3

HW3

5 pages

Load more
Download BigDataHadoop_PPT_Lesson09
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view BigDataHadoop_PPT_Lesson09 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view BigDataHadoop_PPT_Lesson09 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?