DOC PREVIEW
UT Dallas CS 6350 - BigDataHadoop_PPT_Lesson01

This preview shows page 1-2-3-18-19-37-38-39 out of 39 pages.

Save
View full document
Premium Document
Do you want full access? Go Premium and unlock all 39 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Big Data and Hadoop Developer Lesson 1 Introduction to Big Data and Hadoop Copyright 2014 Simplilearn All rights reserved Copyright 2014 Simplilearn All rights reserved Objectives By the end of this lesson you will be able to Identify the need for Big Data Explain the concept of Big Data Describe the basics of Hadoop Explain the benefits of Hadoop Copyright 2014 Simplilearn All rights reserved Data Explosion Over 2 5 exabytes 2 5 billion gigabytes of data is generated every day Following are some of the sources of the huge volume of data A typical large stock exchange captures more than 1 TB of data every day There are around 5 billion mobile phones including 1 75 billion smart phones in the world YouTube users upload more than 48 hours of video every minute Large social networks such as Twitter and Facebook capture more than 10 TB of data daily There are more than 30 million networked sensors in the world Copyright 2014 Simplilearn All rights reserved Types of Data The following three types of data can be identified Structured data Data which is represented in a tabular format E g Databases Semi structured data Data which does not have a formal data model E g XML files Unstructured data Data which does not have a pre defined data model E g Text files Copyright 2014 Simplilearn All rights reserved Need for Big Data Following are the reasons why Big Data is needed 90 of the data in the world today has been created in the last two years alone 80 of the data is unstructured or exists in widely varying structures which are difficult to analyze Structured formats have some limitations with respect to handling large quantities of data It is difficult to integrate information distributed across multiple systems Most business users do not know what should be analyzed Potentially valuable data is dormant or discarded It is too expensive to justify the integration of large volumes of unstructured data A lot of information has a short useful lifespan Context adds meaning to the existing information Copyright 2014 Simplilearn All rights reserved Data The Most Valuable Resource In its raw form oil has little value Once processed and refined it helps power the world Ann Winblad Data is the new oil Clive Humby CNBC Copyright 2014 Simplilearn All rights reserved Big Data and Its Sources Big data is an all encompassing term for any collection of data sets so large and complex that it becomes difficult to process them using on hand data management tools or traditional data processing applications The sources of Big Data are web logs sensor networks social media internet text and documents internet pages search index data atmospheric science astronomy biochemical and medical records scientific research military surveillance and photography archives Copyright 2014 Simplilearn All rights reserved Three Characteristics of Big Data Big Data has three characteristics variety velocity and volume Variety encompasses managing the complexity of data in many different structures ranging from relational data to logs and raw text Click each arrow to learn more Copyright 2014 Simplilearn All rights reserved Characteristics of Big Data Technology Following are the characteristics of Big Data technology 50x 2014 2024 Cost efficiently processes the growing volume Responds to the increasing velocity Collectively analyzes the widening variety Turned 12 terabytes of Tweets created each day into improved product sentiment analysis Converted 350 billion annual meter readings to better predict power consumption Copyright 2014 Simplilearn All rights reserved Characteristics of Big Data Technology Big data is high volume high velocity and high variety information assets that demand cost effective innovative forms of information processing for enhanced insight and decision making Source Gartner Copyright 2014 Simplilearn All rights reserved Appeal of Big Data Technology Big Data technology is appealing because of the following reasons It helps to manage and process a huge amount of data cost efficiently It analyzes data in its native form which may be unstructured structured or streaming It captures data from fast happening events in real time It can handle failure of isolated nodes and tasks assigned to such nodes Social media Web Billing ERP Machine data Network elements It can turn data into actionable insights Copyright 2014 Simplilearn All rights reserved Leveraging Multiple Sources of Data Big Data technology enables IT to leverage multiple sources of data Following are some of the sources Application data Machine data Enterprise data Social data High volume High velocity Variety Variety Structured Semi structured Highly unstructured Highly unstructured High throughput Ingestion at a high Veracity High volume speed Click each image to learn more Copyright 2014 Simplilearn All rights reserved Traditional IT Analytics Approach The following are the requirements of the traditional IT analytics approach and factors they are challenged by Requirements Challenging factors The business team needs to define The requirements are iterative and volatile questions before IT development The data sources keep changing They need to define data sources and structures Copyright 2014 Simplilearn All rights reserved Traditional IT Analytics Approach In a typical scenario of traditional IT systems development the requirements are defined followed by solution design and build Once the solution is implemented queries are executed If there are new requirements or queries the system is redesigned and rebuilt Define requirements Redesign and rebuild for new requirements Design solution Execute queries Copyright 2014 Simplilearn All rights reserved Big Data Technology Platform for Discovery and Exploration Following are the requirements for using Big Data technology as a platform for discovery and exploration and the challenges overcome by the same Requirements The business team needs to define data Challenges overcome by Big Data sources They need to establish the hypothesis The technology should enable explorative analysis Data systems and sources need to be integrated as required Copyright 2014 Simplilearn All rights reserved Big Data Technology Platform for Discovery and Exploration The image illustrates how IT systems are built with the help of Big Data technology Identify data sources New questions lead to addition of data sources and integration Create a platform for creative exploration of available data and content Determine


View Full Document

UT Dallas CS 6350 - BigDataHadoop_PPT_Lesson01

Documents in this Course
HW3

HW3

5 pages

NOSQL-CAP

NOSQL-CAP

23 pages

BigTable

BigTable

39 pages

HW3

HW3

5 pages

Load more
Download BigDataHadoop_PPT_Lesson01
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view BigDataHadoop_PPT_Lesson01 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view BigDataHadoop_PPT_Lesson01 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?