DOC PREVIEW
UT Dallas CS 6350 - BigDataHadoop_PPT_Lesson01

This preview shows page 1-2-3-18-19-37-38-39 out of 39 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 39 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 39 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 39 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 39 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 39 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 39 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 39 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 39 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 39 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Copyright 2014, Simplilearn, All rights reserved. Copyright 2014, Simplilearn, All rights reserved. Lesson 1—Introduction to Big Data and Hadoop Big Data and Hadoop DeveloperCopyright 2014, Simplilearn, All rights reserved. Objectives ● Identify the need for Big Data ● Explain the concept of Big Data ● Describe the basics of Hadoop ● Explain the benefits of Hadoop By the end of this lesson, you will be able to:Copyright 2014, Simplilearn, All rights reserved. Over 2.5 exabytes(2.5 billion gigabytes) of data is generated every day. Following are some of the sources of the huge volume of data: ● A typical, large stock exchange captures more than 1 TB of data every day. ● There are around 5 billion mobile phones (including 1.75 billion smart phones) in the world. ● YouTube users upload more than 48 hours of video every minute. ● Large social networks such as Twitter and Facebook capture more than 10 TB of data daily. ● There are more than 30 million networked sensors in the world. Data ExplosionCopyright 2014, Simplilearn, All rights reserved. The following three types of data can be identified: Types of Data Semi-structured data: Data which does not have a formal data model E.g.: XML files Structured data: Data which is represented in a tabular format E.g.: Databases Unstructured data: Data which does not have a pre-defined data model E.g.: Text filesCopyright 2014, Simplilearn, All rights reserved. Following are the reasons why Big Data is needed: ● 90% of the data in the world today has been created in the last two years alone. ● 80% of the data is unstructured or exists in widely varying structures, which are difficult to analyze. ● Structured formats have some limitations with respect to handling large quantities of data. ● It is difficult to integrate information distributed across multiple systems. ● Most business users do not know what should be analyzed. ● Potentially valuable data is dormant or discarded. ● It is too expensive to justify the integration of large volumes of unstructured data. ● A lot of information has a short, useful lifespan. ● Context adds meaning to the existing information. Need for Big DataCopyright 2014, Simplilearn, All rights reserved. “In its raw form, oil has little value. Once processed and refined, it helps power the world.” —Ann Winblad “Data is the new oil.” —Clive Humby, CNBC Data—The Most Valuable ResourceCopyright 2014, Simplilearn, All rights reserved. Big data is an all-encompassing term for any collection of data sets so large and complex that it becomes difficult to process them using on-hand data management tools or traditional data processing applications. The sources of Big Data are: ● web logs; ● sensor networks; ● social media; ● internet text and documents; ● internet pages; ● search index data; ● atmospheric science, astronomy, biochemical and medical records; ● scientific research; ● military surveillance; and ● photography archives. Big Data and Its SourcesCopyright 2014, Simplilearn, All rights reserved. Big Data has three characteristics: variety, velocity, and volume. Three Characteristics of Big Data Variety encompasses managing the complexity of data in many different structures, ranging from relational data to logs and raw text. Click each arrow to learn more.Copyright 2014, Simplilearn, All rights reserved. Following are the characteristics of Big Data technology: Characteristics of Big Data Technology ● Turned 12 terabytes of Tweets created each day into improved product sentiment analysis ● Converted 350 billion annual meter readings to better predict power consumption 50x 2024 2014 Cost efficiently processes the growing volume Responds to the increasing velocity Collectively analyzes the widening varietyCopyright 2014, Simplilearn, All rights reserved. Characteristics of Big Data Technology Big data is high-volume, high-velocity and high-variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making. Source: Gartner !Copyright 2014, Simplilearn, All rights reserved. Big Data technology is appealing because of the following reasons: ● It helps to manage and process a huge amount of data cost efficiently. ● It analyzes data in its native form, which may be unstructured, structured, or streaming. ● It captures data from fast-happening events in real time. ● It can handle failure of isolated nodes and tasks assigned to such nodes. ● It can turn data into actionable insights. Appeal of Big Data Technology ERP Machine data Web Social media Billing $ $ Network elementsCopyright 2014, Simplilearn, All rights reserved. Big Data technology enables IT to leverage multiple sources of data. Following are some of the sources: Leveraging Multiple Sources of Data ● High volume ● Structured ● High throughput ● High velocity ● Semi-structured ● Ingestion at a high speed ● Variety ● Highly unstructured ● Veracity ● Variety ● Highly unstructured ● High volume Application data Machine data Social data Enterprise data Click each image to learn more.Copyright 2014, Simplilearn, All rights reserved. The following are the requirements of the traditional IT analytics approach and factors they are challenged by: Traditional IT Analytics Approach Requirements ● The business team needs to define questions before IT development. ● They need to define data sources and structures. Challenging factors ● The requirements are iterative and volatile. ● The data sources keep changing.Copyright 2014, Simplilearn, All rights reserved. In a typical scenario of traditional IT systems development, the requirements are defined, followed by solution design and build. Once the solution is implemented, queries are executed. If there are new requirements or queries, the system is redesigned and rebuilt. Traditional IT Analytics Approach Define requirements Design solution Execute queries Redesign and rebuild for new requirementsCopyright 2014, Simplilearn, All rights reserved. Following are the requirements for using Big Data technology as a platform for discovery and exploration, and the challenges overcome by the same: Big Data Technology—Platform for Discovery and Exploration Requirements ● The business team needs to define data sources. ● They need


View Full Document

UT Dallas CS 6350 - BigDataHadoop_PPT_Lesson01

Documents in this Course
HW3

HW3

5 pages

NOSQL-CAP

NOSQL-CAP

23 pages

BigTable

BigTable

39 pages

HW3

HW3

5 pages

Load more
Download BigDataHadoop_PPT_Lesson01
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view BigDataHadoop_PPT_Lesson01 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view BigDataHadoop_PPT_Lesson01 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?