DOC PREVIEW
UT Dallas CS 6350 - 10.HiveBigData

This preview shows page 1-2-20-21 out of 21 pages.

Save
View full document
Premium Document
Do you want full access? Go Premium and unlock all 21 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

HIVE A warehouse solution over map reduce framework Dony Ang Ashish Thusoo Joydeep Sen Sarma Namit Jain Zheng Shao Prasad Chakka Suresh Anthony Hao Liu Pete Wyckoff and Raghotham Murthy 01 18 2019 HIVE A warehouse solution over Map Reduce Framework 1 overview background what is Hive Hive DB Hive architecture Hive datatypes hiveQL hive components execution flows compiler in details pros and cons conclusion 01 18 2019 HIVE A warehouse solution over Map Reduce Framework 2 background Size of collected and analyzed datasets for business intelligence is growing rapidly making traditional warehousing more Hadoop is a popular open source mapreduce as an alternative to store and process extremely large data sets on commodity hardware However map reduce itself is very low level and required developers to write custom code 01 18 2019 HIVE A warehouse solution over Map Reduce Framework 3 General Ecosystem of DW Reporting BI layer SQL M R Hadoop M R SQL ETL 01 18 2019 HIVE A warehouse solution over Map Reduce Framework 4 what is hive Open source DW solution built on top of Hadoop Support SQL like declarative language called HiveQL which are compiled into map reduce jobs executed on Hadoop Also support custom map reduce script to be plugged into query Includes a system catalog Hive Metastore for query optimizations and data exploration 01 18 2019 HIVE A warehouse solution over Map Reduce Framework 5 Hive Database Data Model Tables Analogous to tables in relational database Each table has a corresponding HDFS dir Data is serialized and stored in files within dir Support external tables on data stored in HDFS NFS or local directory Partitions table can have 1 or more partitions 1 level which determine the distribution of data within subdirectories of table directory 01 18 2019 HIVE A warehouse solution over Map Reduce Framework 6 HIVE Database cont e q Table T under wh T and is partitioned on column ds ctry For ds 20090101 ctry US Then data is stored within dir wh T ds 20090101 ctry US Buckets Data in each partition are divided into buckets based on hash of a column in the table Each bucket is stored as a file in the partition directory 01 18 2019 HIVE A warehouse solution over Map Reduce Framework 7 HIVE datatype Support primitive column types Integer Floating point Strings Date Boolean As well as nestable collections such as array or map User can also define their own type programmatically 01 18 2019 HIVE A warehouse solution over Map Reduce Framework 8 Data Units Databases Tables Partitions Buckets or Clusters Type System Primitive types Integers TINYINT SMALLINT INT BIGINT Boolean BOOLEAN Floating point numbers FLOAT DOUBLE String STRING Complex types Structs a INT b INT Maps M group Arrays a b c A 1 returns b Examples DDL Operations CREATE TABLE sample foo INT bar STRING PARTITIONED BY ds STRING SHOW TABLES s DESCRIBE sample ALTER TABLE sample ADD COLUMNS new col INT DROP TABLE sample Examples DML Operations LOAD DATA LOCAL INPATH sample txt OVERWRITE INTO TABLE sample PARTITION ds 2012 02 24 LOAD DATA INPATH user falvariz hive sample txt OVERWRITE INTO TABLE sample PARTITION ds 201202 24 SELECTS and FILTERS SELECT foo FROM sample WHERE ds 2012 02 24 INSERT OVERWRITE DIRECTORY tmp hdfs out SELECT FROM sample WHERE ds 2012 02 24 INSERT OVERWRITE LOCAL DIRECTORY tmp hive sample out SELECT FROM sample hiveQL Support SQL like query language called HiveQL for select join aggregate union all and sub query in the from clause Support DDL stmt such as CREATE table with serialization format partitioning and bucketing columns Command to load data from external sources and INSERT into HIVE tables LOAD DATA LOCAL INPATH logs status updates INTO TABLE status updates PARTITION ds 2009 03 20 DO NOT support UPDATE and DELETE 01 18 2019 HIVE A warehouse solution over Map Reduce Framework 14 hiveQL cont Support multi table INSERT FROM SELECT a status b schoold b gender FROM status updates a JOIN profiles b ON a userid b userid and a ds 2009 03 20 subq1 INSERT OVERWRITE TABLE gender summary PARTITION ds 2009 03 20 SELECT subq1 gender COUNT 1 GROUP BY subq1 gender INSERT OVERWRITE TABLE school summary PARTITION ds 009 03 20 SELECT subq school COUNT 1 GROUP BY subq1 school Also support User defined column transformation UDF and aggregation UDAF function written in Java 01 18 2019 HIVE A warehouse solution over Map Reduce Framework 15 Aggregations and Groups SELECT MAX foo FROM sample SELECT ds COUNT SUM foo FROM sample GROUP BY ds FROM sample s INSERT OVERWRITE TABLE bar SELECT s bar count WHERE s foo 0 GROUP BY s bar Join CREATE TABLE customer id INT name STRING address STRING ROW FORMAT DELIMITED FIELDS TERMINATED BY CREATE TABLE order cust id INT cus id INT prod id INT price INT ROW FORMAT DELIMITED FIELDS TERMINATED BY t SELECT FROM customer c JOIN order cust o ON c id o cus id SELECT c id c name c address ce exp FROM customer c JOIN SELECT cus id sum price AS exp FROM order cust GROUP BY cus id ce ON c id ce cus id Multi table insert Dynamic partition insert FROM page view stg pvs INSERT OVERWRITE TABLE page view PARTITION dt 2008 06 08 country US SELECT pvs viewTime WHERE pvs country US INSERT OVERWRITE TABLE page view PARTITION dt 2008 06 08 country CA SELECT pvs viewTime WHERE pvs country CA INSERT OVERWRITE TABLE page view PARTITION dt 2008 06 08 country UK SELECT pvs viewTime WHERE pvs country UK FROM page view stg pvs INSERT OVERWRITE TABLE page view PARTITION dt 2008 06 08 country SELECT pvs viewTime https cwiki apache org confluence display Hive Tutorial Tutorial Dynamic PartitionInsert HIVE Architecture 01 18 2019 HIVE A warehouse solution over Map Reduce Framework 19 HIVE Components External Interfaces User Interfaces both CLI and Web UI and API likes JDBC and ODBC Hive Thrift Server simple client API to execute HiveQL statements Metastore system catalog Driver Manages the lifecycle of HiveQL for compilation optimization and execution 01 18 2019 HIVE A warehouse solution over Map Reduce Framework 20 Execution Flow 01 18 2019 HIVE A warehouse solution over Map Reduce Framework 21


View Full Document

UT Dallas CS 6350 - 10.HiveBigData

Documents in this Course
HW3

HW3

5 pages

NOSQL-CAP

NOSQL-CAP

23 pages

BigTable

BigTable

39 pages

HW3

HW3

5 pages

Load more
Download 10.HiveBigData
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view 10.HiveBigData and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view 10.HiveBigData and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?