DOC PREVIEW
UT Dallas CS 6350 - HiveBigData

This preview shows page 1-2-20-21 out of 21 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 21 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 21 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 21 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 21 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 21 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Slide 1overviewbackgroundGeneral Ecosystem of DWwhat is hive ?Hive DatabaseHIVE Database cont.HIVE datatypeData UnitsType SystemExamples – DDL OperationsExamples – DML OperationsSELECTS and FILTERShiveQLhiveQL cont.Aggregations and GroupsJoinMulti table insert - Dynamic partition insertHIVE ArchitectureHIVE ComponentsExecution FlowHIVE A warehouse solution over map-reduce frameworkAshish Thusoo, Joydeep Sen Sarma, Namit Jain, Zheng Shao,Prasad Chakka, Suresh Anthony, Hao Liu, Pete Wyckoff and Raghotham MurthyDony Ang01/14/2019 1HIVE - A warehouse solution over Map Reduce Frameworkoverviewbackgroundwhat is HiveHive DBHive architectureHive datatypeshiveQLhive componentsexecution flowscompiler in detailspros and consconclusion01/14/2019 2HIVE - A warehouse solution over Map Reduce FrameworkbackgroundSize of collected and analyzed datasets for business intelligence is growing rapidly, making traditional warehousing more $$$Hadoop is a popular open source map-reduce as an alternative to store and process extremely large data sets on commodity hardwareHowever, map reduce itself is very low-level and required developers to write custom code.01/14/2019 3HIVE - A warehouse solution over Map Reduce FrameworkGeneral Ecosystem of DW01/14/2019HIVE - A warehouse solution over Map Reduce Framework 4HadoopM / RM / RReporting / BI layerSQLETLSQLwhat is hive ?Open-source DW solution built on top of HadoopSupport SQL-like declarative language called HiveQL which are compiled into map-reduce jobs executed on HadoopAlso support custom map-reduce script to be plugged into query.Includes a system catalog, Hive Metastore for query optimizations and data exploration01/14/2019 5HIVE - A warehouse solution over Map Reduce FrameworkHive DatabaseData ModelTables○Analogous to tables in relational database○Each table has a corresponding HDFS dir○Data is serialized and stored in files within dir○Support external tables on data stored in HDFS, NFS or local directory.Partitions○@table can have 1 or more partitions (1-level) which determine the distribution of data within subdirectories of table directory.01/14/2019 6HIVE - A warehouse solution over Map Reduce FrameworkHIVE Database cont.e.q : Table T under /wh/T and is partitioned on column ds + ctryFor ds=20090101ctry=USThen data is stored within dir /wh/T/ds=20090101/ctry=USBuckets○Data in each partition are divided into buckets based on hash of a column in the table. Each bucket is stored as a file in the partition directory.01/14/2019 7HIVE - A warehouse solution over Map Reduce FrameworkHIVE datatypeSupport primitive column types IntegerFloating pointStringsDateBooleanAs well as nestable collections such as array or mapUser can also define their own type programmatically01/14/2019 8HIVE - A warehouse solution over Map Reduce FrameworkData UnitsDatabases.Tables.Partitions.Buckets (or Clusters)..Type SystemPrimitive types–Integers:TINYINT, SMALLINT, INT, BIGINT.–Boolean: BOOLEAN.–Floating point numbers: FLOAT, DOUBLE .–String: STRING.Complex types–Structs: {a INT; b INT}.–Maps: M['group'].–Arrays: ['a', 'b', 'c'], A[1] returns 'b'.Examples – DDL OperationsCREATE TABLE sample (foo INT, bar STRING) PARTITIONED BY (ds STRING); SHOW TABLES '.*s';DESCRIBE sample;ALTER TABLE sample ADD COLUMNS (new_col INT);DROP TABLE sample;Examples – DML OperationsLOAD DATA LOCAL INPATH './sample.txt' OVERWRITE INTO TABLE sample PARTITION (ds='2012-02-24');LOAD DATA INPATH '/user/falvariz/hive/sample.txt' OVERWRITE INTO TABLE sample PARTITION (ds='2012-02-24');SELECTS and FILTERSSELECT foo FROM sample WHERE ds='2012-02-24';INSERT OVERWRITE DIRECTORY '/tmp/hdfs_out' SELECT * FROM sample WHERE ds='2012-02-24';INSERT OVERWRITE LOCAL DIRECTORY '/tmp/hive-sample-out' SELECT * FROM sample;hiveQLSupport SQL-like query language called HiveQL for select,join, aggregate, union all and sub-query in the from clauseSupport DDL stmt such as CREATE table with serialization format, partitioning and bucketing columnsCommand to load data from external sources and INSERT into HIVE tables.LOAD DATA LOCAL INPATH ‘/logs/status_updates’ INTO TABLE status_updates PARTITION (ds=‘2009-03-20’)DO NOT support UPDATE and DELETE01/14/2019 14HIVE - A warehouse solution over Map Reduce FrameworkhiveQL cont.Support multi-table INSERTFROM (SELECT a.status, b.schoold, b.gender FROM status_updates a JOIN profiles b ON (a..userid = b.userid) and a.ds=‘2009-03-20’)) subq1INSERT OVERWRITE TABLE gender_summary PARTITION (ds=‘2009-03-20’)SELECT subq1.gender,COUNT(1) GROUP BY subq1.genderINSERT OVERWRITE TABLE school_summary PARTITION (ds=‘009-03-20’)SELECT subq.school, COUNT(1) GROUP BY subq1.school Also support User-defined column transformation (UDF) and aggregation (UDAF) function written in Java01/14/2019 15HIVE - A warehouse solution over Map Reduce FrameworkAggregations and GroupsSELECT MAX(foo) FROM sample;SELECT ds, COUNT(*), SUM(foo) FROM sample GROUP BY ds;FROM sample s INSERT OVERWRITE TABLE bar SELECT s.bar, count(*) WHERE s.foo > 0 GROUP BY s.bar;JoinSELECT * FROM customer c JOIN order_cust o ON (c.id=o.cus_id);SELECT c.id,c.name,c.address,ce.exp FROM customer c JOIN (SELECT cus_id,sum(price) AS exp FROM order_cust GROUP BY cus_id) ce ON (c.id=ce.cus_id);CREATE TABLE customer (id INT,name STRING,address STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '#';CREATE TABLE order_cust (id INT,cus_id INT,prod_id INT,price INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';Multi table insert -Dynamic partition insert FROM page_view_stg pvs INSERT OVERWRITE TABLE page_view PARTITION(dt='2008-06-08', country='US') SELECT pvs.viewTime, … WHERE pvs.country = 'US' INSERT OVERWRITE TABLE page_view PARTITION(dt='2008-06-08', country='CA') SELECT pvs.viewTime, ... WHERE pvs.country = 'CA' INSERT OVERWRITE TABLE page_view PARTITION(dt='2008-06-08', country='UK') SELECT pvs.viewTime, ... WHERE pvs.country = 'UK';FROM page_view_stg pvs INSERT OVERWRITE TABLE page_view PARTITION(dt='2008-06-08', country) SELECT pvs.viewTime, ...https://cwiki.apache.org/confluence/display/Hive/Tutorial#Tutorial-Dynamic-PartitionInsertHIVE Architecture01/14/2019 19HIVE - A warehouse solution over Map Reduce FrameworkHIVE ComponentsExternal InterfacesUser Interfaces both CLI and Web UI and API likes JDBC and ODBC.Hive Thrift Serversimple client API to


View Full Document

UT Dallas CS 6350 - HiveBigData

Documents in this Course
HW3

HW3

5 pages

NOSQL-CAP

NOSQL-CAP

23 pages

BigTable

BigTable

39 pages

HW3

HW3

5 pages

Load more
Download HiveBigData
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view HiveBigData and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view HiveBigData 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?