DOC PREVIEW
UT Dallas CS 6350 - BigDataHadoop_PPT_Lesson13

This preview shows page 1-2-3-20-21-40-41-42 out of 42 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 42 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 42 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 42 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 42 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 42 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 42 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 42 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 42 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 42 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Copyright 2014, Simplilearn, All rights reserved Copyright 2014, Simplilearn, All rights reserved Lesson 13—Hadoop Administration, Troubleshooting, and Security Big Data and Hadoop DeveloperCopyright 2014, Simplilearn, All rights reserved ● Explain different configurations of the Hadoop cluster ● Identify different parameters for performance monitoring and performance tuning ● Explain configuration of security parameters in Hadoop By the end of this lesson, you will be able to: ObjectivesCopyright 2014, Simplilearn, All rights reserved Typical Hadoop Core Cluster A typical Hadoop Core cluster is made up of machines that run a set of cooperating server processes. ● The machines in the cluster are not required to be homogeneous. ● If the machines have similar processing power, memory, and disk bandwidth, cluster administration becomes a lot easier. o Only one set of configuration files and runtime environments needs to be maintained and distributed.Copyright 2014, Simplilearn, All rights reserved Load Balancer is a tool for balancing load for data once a request is generated by a user or an application. Load Balancer start-balancer.sh stop-balancer.sh To start the balancer To stop the balancerCopyright 2014, Simplilearn, All rights reserved Different commands are used in Hadoop programming. The JobTracker is expected to run on the machine on which the scripts are executed. The Hadoop Core servers load their configurations from files available in the configuration directory of any Hadoop Core installation. Commands Used in Hadoop Programming slaves.sh start-mapred.sh and stop-mapred.sh To run its arguments as a command on each of the hosts listed in the conf/slaves file. To start and stop the MapReduce servers. Starts or stops only the JobTracker and TaskTracker nodes.Copyright 2014, Simplilearn, All rights reserved Configuration files are responsible for configuring the system for a specific task. Different Configuration Files of Hadoop Cluster core-site.xml mapred-site.xml Used to define the NameNode and HDFS temporary directory Used to define the number of reducers, mappers, and other settings related to MapReduce operations hadoop-env.sh Used to set the Hadoop environment settings like Java path, security settings, etc. masters Used to specify the Secondary NameNode in a clustered environment slaves Used to specify the data nodes in a clustered environmentCopyright 2014, Simplilearn, All rights reserved. hadoop-default.xml is used for setting up the parameters that maintain consistency in the Hadoop cluster with respect to distributed computing. Properties defined through hadoop-default.xml: Properties of hadoop-default.xml ● Related to input and output operations to and from HDFS cluster I/O properties ● Related to the input and output files during job execution File system properties ● Settings related to proper job execution like number of mappers MapReduce properties ● Settings related to inter-process communication IPC properties ● Settings that are responsible to be maintained throughout the cluster Global properties ● Settings related to log generation and maintenance Logging propertiesCopyright 2014, Simplilearn, All rights reserved Three critical parameters that must be configured for any Hadoop cluster are as follows: Different Configurations for Hadoop Cluster hadoop.tmp.dir fs.default.name Used as a temporary directory for both local file system and HDFS Used to specify the NameNode machines, hostname, and port number mapred.job.tracker Defines the host and port that the MapReduce JobTracker runsCopyright 2014, Simplilearn, All rights reserved Three critical parameters that must be configured for any Hadoop DFS are as follows: Different Configurations for Hadoop Cluster (contd.) dfs.name.dir dfs.data.dir Determines where on the local file system a NameNode metadata is stored May be a comma- or space-separated list of directories All the provided directories are used for redundant storage Determines where on the local file system a DataNode stores blocks May be a comma- or space-separated list of directories Follows distributed data among the directories HDFS replicates data storage blocks to multiple DataNodes Directory experiences bulk I/O transactions mapred.local.dir A local directory where TaskTracker stores intermediate output May be a comma-separated list of directories, preferably on different devices I/O is spread among the directories for increased performance Directory experiences bulk I/O that has a short lifeCopyright 2014, Simplilearn, All rights reserved Port numbers for individual Hadoop services can be classified as follows: Port Numbers for Individual Hadoop Services Port Number Name of the Parameter Explanation for the Parameter 50030 mapred.job.tracker.http.address JobTracker administrative web GUI 50070 dfs.http.address NameNode administrative web GUI 50010 dfs.datanode.address DataNode control port 50020 dfs.datanode.ipc.address DataNode IPC port, used for block transfer 50060 mapred.task.tracker.http address Per TaskTracker web interface 50075 dfs.datanode.http address Per DataNode web interface 50090 dfs.secondary.http address Per Secondary NameNode web interface 50470 dfs.https address NameNode web GUI via HTTPS 50475 dfs.datanode.https address Per DataNode web GUI via HTTPSCopyright 2014, Simplilearn, All rights reserved The performance of the cluster needs to be monitored to ensure that the resources are properly allocated and de-allocated for optimum utilization. This ensures that the resources are not idle. The Hadoop framework provides several APIs for allowing external agents to provide monitoring services to the Hadoop Core service. Following are few such agents: Performance Monitoring JMX Nagios Ganglia Chukwa FailMonCopyright 2014, Simplilearn, All rights reserved Performance tuning is a method of making resources participate in a specific job so that the job will be done faster and better. Factors considered during performance tuning: ● Network Bandwidth ● Disk Throughput ● CPU Overhead ● Memory Performance TuningCopyright 2014, Simplilearn, All rights reserved Performance Tuning is carried out tactfully by using the following parameters: Parameters of Performance Tuning Parameters Function dfs.datanode.handler.count Handle the number of server threads for the DataNode dfs.datanode.du.reserved Used to reserve space in bytes per volume dfs.replication Used


View Full Document

UT Dallas CS 6350 - BigDataHadoop_PPT_Lesson13

Documents in this Course
HW3

HW3

5 pages

NOSQL-CAP

NOSQL-CAP

23 pages

BigTable

BigTable

39 pages

HW3

HW3

5 pages

Load more
Download BigDataHadoop_PPT_Lesson13
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view BigDataHadoop_PPT_Lesson13 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view BigDataHadoop_PPT_Lesson13 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?