Big Data and Hadoop Developer Lesson 3 Hadoop Architecture Copyright 2014 Simplilearn All rights reserved Copyright 2014 Simplilearn All rights reserved Objectives Objectives By the end of this lesson you will be able to Describe the use of Hadoop in commodity hardware Explain the various configurations and services of Hadoop Differentiate between regular file system and Hadoop Distributed File System HDFS Explain HDFS architecture Copyright 2014 Simplilearn All rights reserved Hadoop Cluster Using Commodity Hardware Key Terms Some key terms used while discussing Hadoop Architecture Commodity hardware PCs which can be used to make a cluster Cluster Interconnection of systems in a network Node Commodity servers interconnected through a network device Copyright 2014 Simplilearn All rights reserved Hadoop Hadoop Cluster Cluster Using Using Commodity Commodity Hardware Hardware Hadoop supports the concept of distributed architecture The diagram represents the nodes connected and installed with Hadoop The number of nodes in a rack depends on the network speed Uplink from rack to node is 3 to 4 Gb s Uplink from rack to rack is 1 Gb s Copyright 2014 Simplilearn All rights reserved Hadoop Configuration Hadoop Configuration Standalone pseudo distributed and fully distributed are three modes of Hadoop configuration Modes of Hadoop configuration Standalone mode Pseudo distributed mode Fully distributed mode All Hadoop services Individual Hadoop Hadoop services run run in a services run in an in different JVMs single JVM on a individual JVM but but belong to one single machine on a single machine cluster Copyright 2014 Simplilearn All rights reserved Hadoop CoreServices Services Hadoop Core The core services of Hadoop are NameNode DataNode JobTracker TaskTracker Secondary NameNode Copyright 2014 Simplilearn All rights reserved Apache HadoopCore Core Components Apache Hadoop Components Hadoop HDFS and Hadoop MapReduce are the core components of Hadoop Copyright 2014 Simplilearn All rights reserved Hadoop CoreComponents HDFS Components HDFS Hadoop Core The key features of Hadoop HDFS are as follows provides high throughput access to data blocks provides limited interface for managing the file system to allow it to scale and creates multiple replicas of each data block and distributes them on computers throughout the cluster to enable reliable and rapid data access Copyright 2014 Simplilearn All rights reserved Hadoop CoreComponents MapReduce Components MapReduce Hadoop Core The key features of Hadoop MapReduce are as follows performs distributed data processing using the MapReduce programming paradigm allows to possess user defined map phase which is a parallel share nothing processing of input MapReduce paradigm and the aggregating the output of the map phase which is a user defined reduces phase after a map process Copyright 2014 Simplilearn All rights reserved Regular FileSystem System HDFS Regular File vs v s HDFS A simple comparison between regular file system and HDFS is summarized below Regular File System Each block of data is small in size HDFS approximately 51 bytes Large data access suffers from disk I O problems mainly because of multiple Each block of data is very large in size 64MB by default Reads huge data sequentially after a single seek seek operation Copyright 2014 Simplilearn All rights reserved Hadoop Core Components MapReduce HDFS Characteristics The basic characteristics of HDFS that make it popular are High fault tolerance High throughput Suitable for applications with large data sets Suitable for applications with streaming access to file system data Can be built on commodity hardware and heterogeneous platforms Copyright 2014 Simplilearn All rights reserved HDFS Key Features HDFS Key Features Some key features of HDFS HDFS creates multiple replicas of each data block and distributes them on computers throughout a cluster to enable reliable and rapid access HDFS is the storage system for both input and output of the MapReduce jobs Hadoop file URL is to be specified like hdfs filename Block storage meta data controls the physical location of the block and replication within the cluster Each block is replicated to a small number of physically separate machines Copyright 2014 Simplilearn All rights reserved HDFS Layer HDFS Architecture HDFS architecture can be summarized as follows NameNode and the Secondary NameNode services constitute the master service DataNode service is the slave service The master service is responsible for accepting a job from clients and ensures that the data required for the operation will be loaded and segregated into chunks of data blocks HDFS exposes a file system namespace and allows user data to be stored in files A file is split into one or more blocks stored and replicated in DataNodes The data blocks are then distributed to the DataNode systems within the cluster This ensures that replicas of the data are maintained Copyright 2014 Simplilearn All rights reserved HDFS Operation Principle The HDFS components comprise different servers like NameNode DataNode and Secondary NameNode NameNode Server single instance DataNode Server multiple instances Secondary NameNode Server single instance Maintains the file system name space Associated with data storage places in the file system Not exactly a hot backup of the actual NameNode server Manages the files and directories in the file system tree Reports to NameNode periodically with lists of blocks they store Used for recovery of NameNode in case of NameNode failure Stores information in the namespace image and the edit log Stores and retrieves blocks when referred by clients or NameNode Keeps namespace image through edit log periodically NameNode knows the data nodes on which all the blocks for a given file exist Namespace image lags behind so total recovery is impossible NameNode is a critical one point failure node Servers read write requests performs block creation deletion and replication upon instruction from NameNode Copyright 2014 Simplilearn All rights reserved HDFS HDFS 1 0 HDFS is the place where data is stored and Hadoop operations are performed via HDFS Copyright 2014 Simplilearn All rights reserved File System Namespace Some points related to file system namespace of HDFS HDFS exposes a file system namespace and allows user data to be stored in files HDFS has a Hierarchical file system with directories and files HDFS supports operations like create remove move rename etc The
View Full Document