DOC PREVIEW
UT Dallas CS 6350 - HBaseRevised

This preview shows page 1-2-3-27-28-29 out of 29 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Slide 1Slide 2Why Hadoop and HBase?History of Hadoop and HBaseWhat is HBase?HBase is not…Slide 7HBase tablesHBase tablesHBase tablesHBase tablesHBase tablesHBase tablesHBase tablesHBase tablesHBase tablesHBase tablesHBase tablesHBase tablesHBase tablesHBase ArchitectureHBase ArchitectureSlide 23ImpalaSlide 25Slide 26Slide 27Slide 28Slide 291HBASE – THE SCALABLE DATA STOREAn Introduction to HBaseXLDB Europe Workshop 2013: CERN, GenevaJames KinleyEMEA Solutions Architect, Cloudera2— The Apache Software Foundation“Apache HBase is the Hadoop database, a distributed, scalable, big data store.”Why Hadoop and HBase?3•Datasets are constantly growing and intake soars•CERN stores 100PB of physics data, with 75PB being generated in past 3 years•Traditional databases are expensive to scale and inherently difficult to distribute•Commodity hardware is cheap and powerful•Hadoop…•Is designed to store and process extremely large datasets in batch•Is not intended for realtime querying•Does not support random accessHistory of Hadoop and HBase4•Google solved its scalability problems•“The Google File System” published October 2003•Hadoop DFS•“MapReduce: Simplified Data Processing on Large Clusters” published December 2004•Hadoop MapReduce•“BigTable: A Distributed Storage System for Structured Data” published November 2006•HBaseWhat is HBase?5•Distributed•Column-Oriented•Multi-Dimensional•High-Availability (CAP?)•High-Performance•Storage System•Project Goals:•Billions of Rows * Millions of Columns * Thousands of Versions•Petabytes of data stored across thousands of commodity serversHBase is not…6•A SQL Database•No native query engine, no SQL, no types, no joins•Transactions and secondary indexes only as add-ons but immature•A drop-in replacement for your RDBMS•You must be ok with RDBMS anti-schema•Denormalized data•Wide and sparsely populated tables•Just say “no” to your DBAHBase tables8HBase tables9HBase tables10HBase tables11HBase tables12HBase tables13HBase tables14HBase tables15HBase tables16HBase tables17HBase tables18HBase tables19HBase tables20•Tables are sorted by Row Key in lexicographical order•Table schema only defines its Column Families•Each family consists of any number of Columns•Each column consists of any number of Versions•Columns only exist when inserted, no NULLs•Columns within a family are sorted and stored together•Everything except table name are byte[]•(Table > Row Key > Family:Column > Timestamp) > ValueHBase Architecture 21•Table is made up of any number of regions•Region is specified by its startKey and endKey•Each region may live on different node and is made up of several HDFS files and blocks•Two types of node: Master and RegionServer•Special tables -ROOT- and .META. store schema information and region locations•Master server monitors RegionServers as well as region assignment and load balancing•Uses ZooKeeper for distributed coordinationHBase Architecture 22Impala24•Open-source, general-purpose SQL query engine•Runs directly within Hadoop:•Reads widely used Hadoop file formats and HBase tables•Talks to widely used Hadoop storage managers•Runs on the same nodes that run Hadoop processes•High performance •C++ instead of Java•Runtime code generation (LLVM)•A completely new execution engine that doesn’t build on MapReduce29Thank You!James Kinley, EMEA Solutions Architect,


View Full Document

UT Dallas CS 6350 - HBaseRevised

Documents in this Course
HW3

HW3

5 pages

NOSQL-CAP

NOSQL-CAP

23 pages

BigTable

BigTable

39 pages

HW3

HW3

5 pages

Load more
Download HBaseRevised
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view HBaseRevised and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view HBaseRevised 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?