DOC PREVIEW
UT Dallas CS 6350 - BigDataHadoop_PPT_Lesson07

This preview shows page 1-2-3-4-30-31-32-33-34-61-62-63-64 out of 64 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 64 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 64 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 64 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 64 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 64 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 64 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 64 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 64 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 64 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 64 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 64 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 64 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 64 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 64 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Copyright 2014, Simplilearn, All rights reserved.Copyright 2014, Simplilearn, All rights reservedLesson 7—PigBig Data and Hadoop DeveloperCopyright 2014, Simplilearn, All rights reserved.Objectives● Explain the concepts of Pig● Demonstrate the installation of a Pig engine● Explain the prerequisites for the preparation of environment for Pig LatinBy the end of this lesson, you will be able to: 2Copyright 2014, Simplilearn, All rights reserved.Following are some of the common challenges faced while performing MapReduce programming using Java:Challenges of MapReduce Development Using JavaDeveloper has to write custom codes in Java which will have an impact on production timeCode is difficult to maintain, optimize, and extendDeveloper needs to think in terms of map, split, and reduce fundamentals, which may increase production timeExtremely rigid data flow 3Copyright 2014, Simplilearn, All rights reserved.Pig is one of the components of the Hadoop eco-system.Introduction to PigPig is a high-level data flow scripting language.Pig is an Apache open-source project.Pig runs on the Hadoop clusters.Pig uses HDFS for storing and retrieving data and HadoopMapReduce for processing Big Data.4Copyright 2014, Simplilearn, All rights reserved.Components of PigPig Latin script languageRuntime engine● Procedural data flow language● Contains syntax and commands that can be applied to implement business logic● Example: LOAD, STORE, and so on● Compiler that produces sequences of Map-Reduce programs● Uses HDFS for storing and retrieving the data● Used to interact with the Hadoop system● Parses, validates, and compiles the script operations into a sequence of MapReduce jobsFollowing are the major components of Pig:5Copyright 2014, Simplilearn, All rights reserved.Pig’s operation can be explained in 3 stages:How Pig WorksLoad data and write Pig scriptPig operationsExecution of the PlanA = LOAD ‘myfile’AS (x, y, z);B = FILTER A by x > 0;C = GROUP B BY x;D = FOREACH A GENERATEx, COUNT(B);STORE D INTO ‘output’;● Parses the script● Checks the scripts● Optimizes the scripts● Plans execution● Submitted to Hadoop● Monitors job progress● Map: Filter● Reduce: Count6Copyright 2014, Simplilearn, All rights reserved.As part of its data model, Pig supports four basic types:Data Model● A simple atomic value● Example: ‘Mike’Atom● A sequence of fields that can be any of the data types ● Example: (‘Mike’, 43)Tuple● A collection of tuples of potentially varying structures; can contain duplicates● Example: {(‘Mike’), (‘Doug’, (43, 45))}Bag● An associative array; the key must be a chararray but the value can be any type ● Example: [name#Mike,phone#5551212] Map7Copyright 2014, Simplilearn, All rights reserved.By default, Pig treats undeclared fields as bytearrays. Pig can infer a field’s type based on:● use of operators that expect a certain type of field;● UDFs (User Defined Functions) with a known or explicitly set return type; and● schema information provided by a LOAD function or explicitly declared using an AS clause.Data Model (contd.)Type conversion is lazy which means the data type is enforced at execution only.!8Copyright 2014, Simplilearn, All rights reserved.Pig Latin has a fully-nestable data model with Atomic values, tuples, bags, and maps. Advantages: ● It is more natural to programmers than flat tuples. ● It avoids expensive joins.Nested Data Model9Copyright 2014, Simplilearn, All rights reserved.Pig works in two execution modes:Pig Execution ModesLocal mode: the Pig depends on the OS file system.MapReduce mode: the Pig relies on HDFS.Pig execution modes10Copyright 2014, Simplilearn, All rights reserved.Pig Latin program can be written in two interactive modes:Pig Interactive ModesInteractive mode: a line by line code is written and executed.Batch mode: a file containing Pig scripts is created and executed in a batch.Pig Interactive Modes11Copyright 2014, Simplilearn, All rights reserved.Some of the salient features of the Pig are as follows:● Step-by-step procedural control● Operates directly over files● Schemas are optional● Supports UDFs● Supports various data typesSalient Features12Copyright 2014, Simplilearn, All rights reserved.Pig vs. SQLThe difference between Pig and SQL are given in the table below:Difference Pig SQLDefinition Scripting language used to interact with HDFSQuery language used to interact with databasesQuery StyleStep-by-step Single blockEvaluation Lazy Evaluation Immediate evaluationPipelineSplitsPipeline splits are supportedRequires the join to be run twice or materialized as an intermediate result13Copyright 2014, Simplilearn, All rights reserved.Track customers in Texas who spend more than 2000 USD.Pig vs. SQL—ExampleSQLPigSELECT c_id , SUM(amount) AS CTotalFROM customers cJOIN sales s ON c.c_id = s.c_idWHERE c.city = ‘Texas'GROUP BY c_idHAVING SUM(amount) > 2000ORDER BY CTotal DESCcustomer = LOAD '/data/customer.dat' AS (c_id,name,city);sales = LOAD '/data/sales.dat' AS (s_id,c_id,date,amount);salesBLR = FILTER customer BY city == ‘Texas';joined= JOIN customer BY c_id, salesTX BY c_id;grouped = GROUP joined BY c_id;summed= FOREACH grouped GENERATE GROUP, SUM(joined.salesTX::amount);spenders= FILTER summed BY $1 > 2000;sorted = ORDER spenders BY $1 DESC;DUMP sorted;14Copyright 2014, Simplilearn, All rights reserved.To install the Pig engine, you need to get the correct mirror web link from the website: http://pig.apache.org.Installing Pig Engine15Copyright 2014, Simplilearn, All rights reserved.You need to perform the following steps to install Pig engine:Steps to Installing Pig EngineUnzip the downloaded file.Move the extracted folder to /usr/local/Pig.Export JAVA_HOME path.Export Pig_PREFIX path.Set the path for bin folder of Pig setup.12 34516Copyright 2014, Simplilearn, All rights reserved.Unzip the downloaded file.Installing Pig Engine—Step 117Copyright 2014, Simplilearn, All rights reserved.Move the extracted folder to the following location: /usr/local/Pig.Installing Pig Engine—Step 218Copyright 2014, Simplilearn, All rights reserved.Export JAVA_HOME path.Installing Pig Engine—Step 319Copyright 2014, Simplilearn, All rights reserved.Export Pig_PREFIX path.Installing Pig Engine—Step 420Copyright 2014, Simplilearn, All rights reserved.Set the path for bin folder of Pig setup.Installing Pig Engine—Step 521Copyright 2014, Simplilearn, All rights reserved.You need


View Full Document

UT Dallas CS 6350 - BigDataHadoop_PPT_Lesson07

Documents in this Course
HW3

HW3

5 pages

NOSQL-CAP

NOSQL-CAP

23 pages

BigTable

BigTable

39 pages

HW3

HW3

5 pages

Load more
Download BigDataHadoop_PPT_Lesson07
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view BigDataHadoop_PPT_Lesson07 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view BigDataHadoop_PPT_Lesson07 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?