DOC PREVIEW
UT Dallas CS 6350 - BigDataHadoop_PPT_Lesson11

This preview shows page 1-2-3-4-31-32-33-34-35-64-65-66-67 out of 67 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 67 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 67 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 67 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 67 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 67 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 67 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 67 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 67 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 67 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 67 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 67 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 67 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 67 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 67 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Copyright 2014, Simplilearn, All rights reserved Copyright 2014, Simplilearn, All rights reserved Lesson 11— ZooKeeper, Sqoop, and Flume Big Data and Hadoop DeveloperCopyright 2014, Simplilearn, All rights reserved ● Explain ZooKeeper and its role ● List the challenges faced in distributed processing ● Install and configure ZooKeeper ● Explain the concept of Sqoop ● Install and configure Sqoop ● Explain the concept of Flume ● Configure and run Flume By the end of this lesson, you will be able to: ObjectivesCopyright 2014, Simplilearn, All rights reserved ZooKeeper is an open-source and high performance co-ordination service for distributed applications. It offers the following services: Introduction to ZooKeeper Naming Configuration management Locks and synchronization Group servicesCopyright 2014, Simplilearn, All rights reserved Some salient features of ZooKeeper are as follows: Features of ZooKeeper Provides a simple and high performance kernel for building complex clients Follows first-in-first-out approach for executing jobs Comes with pipeline architecture to achieve a wait-free approach Applies multi-processing approach to avoid the wait-time for process execution Provides distributed co-ordination services for distributed applications Allows synchronization, serialization, and co-ordination of nodes in Hadoop cluster Takes care of problems by using in-built algorithms for deadlock detection and prevention Allows for distributed processingCopyright 2014, Simplilearn, All rights reserved. Challenges Faced in Distributed Applications The following are the common challenges faced in distributed applications: ● Error-prone coordination ● Race conditions ● Deadlocks ● Partial failures ● InconsistenciesCopyright 2014, Simplilearn, All rights reserved The key points related to coordination are: Coordination Group membership Dynamic configuration Queuing Leader election Status monitoring Critical sectionsCopyright 2014, Simplilearn, All rights reserved. Goals of ZooKeeper Following are the goals of ZooKeeper: ● Serialization ensures avoidance of delay in read or write operations. ● Reliability persists when an update is applied by a user in the cluster. ● Atomicity does not allow partial results. Any user update can either succeed or fail. ● Simple Application Programming Interface or API provides an interface for development and implementation.Copyright 2014, Simplilearn, All rights reserved. Uses of ZooKeeper The uses of ZooKeeper are as follows: ● Configuration ● Message queue ● Notification ● SynchronizationCopyright 2014, Simplilearn, All rights reserved Leader Follower Observer ZooKeeper comprises the following three entities: ZooKeeper EntitiesCopyright 2014, Simplilearn, All rights reserved ZooKeeper has a hierarchical namespace. Each node in the namespace is called Znode. ZooKeeper Data ModelCopyright 2014, Simplilearn, All rights reserved. The following points are related to Znode: ZooKeeper Services ● In-memory data node ● Hierarchical namespace ● Follows UNIX like notation Znode ● Regular ● Ephemeral Types of Znode ● Sequential flag Flags of ZnodeCopyright 2014, Simplilearn, All rights reserved. Some features of Znode are as follows: ZooKeeper Services (contd.) ● Receives notification from nodes ● Enables one-time triggers Watch mechanism feature ● Stores metadata or configuration ● Stores information like timestamp version Other features ● Permits allocation of resources for limited time period. Timeout mechanismCopyright 2014, Simplilearn, All rights reserved Create (path, data, and flag) Delete (path and version) Exist (path and watch) getData (path and watch) setData (path, data, and version) getChildren (path, and watch) Sync (path) Given below is a list of client API functions Client API FunctionsCopyright 2014, Simplilearn, All rights reserved Recipe 1: Cluster Management Recipes are guidelines for using ZooKeeper to implement higher order functions. Recipe for cluster management e.g., in cloud environments is given. For each client host i, where i=1..N: ● watch on /members. ● create /members/host-${i} as ephemeral nodes. ● node join/leave generates alert. ● keep updating /members/host -${i} periodically for node status changes (load, memory, CPU, etc.).Copyright 2014, Simplilearn, All rights reserved. Recipe 2: Leader Election Recipe for leader election is as follows: ● All participants create an ephemeral-sequential node on the same election path. ● The node with smallest sequence number is the leader. ‘Follower’ node listens to the node with the next lower sequence number. ● When the leader is removed, go to election-path and find a new leader. ● When session expires, check the election state and go to election if needed.Copyright 2014, Simplilearn, All rights reserved Recipe 3: Distributed Exclusive Lock Recipe for distributed exclusive lock function assuming there are N web crawler clients trying to acquire a lock on links data: ● clients create an ephemeral, sequential znode under the path /Cluster/_locknode_; ● clients request a list of children for the lock znode (i.e., _locknode_); ● the client with the least ID according to natural ordering will hold the lock; other clients set watches on the Znode with id immediately preceding its own id and periodically checks for the lock in case of notification; and ● the client wishing to release a lock deletes the node, which triggers the next client in line to acquire the lock.Copyright 2014, Simplilearn, All rights reserved. The demo in the next section illustrates how to install and configure ZooKeeper. As part of his current project, Tim Burnet the AVP of IT-infra ops anticipates that his superior, Olivia Tyler the EVP of IT operations, would ask him to work on a high-performance coordination service for distributed applications. Tim knows that he has to use ZooKeeper for this task. He wants to be prepared and decides to install ZooKeeper. Business ScenarioCopyright 2014, Simplilearn, All rights reserved Demo 1 View ZooKeeper nodes using CLICopyright 2014, Simplilearn, All rights reserved. Why Sqoop Sqoop is an Apache Hadoop Eco-system project whose responsibility is to import or export operations across relational databases. Some reasons to use Sqoop are as follows: ● SQL servers are deployed worldwide ● Nightly processing is done on SQL servers ● Allows to move certain part of data


View Full Document

UT Dallas CS 6350 - BigDataHadoop_PPT_Lesson11

Documents in this Course
HW3

HW3

5 pages

NOSQL-CAP

NOSQL-CAP

23 pages

BigTable

BigTable

39 pages

HW3

HW3

5 pages

Load more
Download BigDataHadoop_PPT_Lesson11
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view BigDataHadoop_PPT_Lesson11 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view BigDataHadoop_PPT_Lesson11 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?