IST 195 1st Edition Lecture 6Outline of Last Lecture I. Excel II. Tools on Excel III. VLOOKUP IV. Pivot TablesOutline of Current Lecture I. Why is Big Data Helpful?a. The Four V’sII. Where does Big Data Come From?III. The Beginning of DataIV. Issues with Big DataV. Types of Big DataCurrent LectureWhy Is Big Data Helpful - big data is our way for processing this data - the Four V's of Big Data o velocity - the speed of information coming in, analysis of streaming data o variety - the different forms of data (tweets, videos, wearable technology, etc.) o veracity - accuracy of data o volume - people perceive things certain ways and we tend to be wrong - this is why we need statistics (think Moneyball) - poor data quality and knowledge can cost us up to 30 million dollars per year - before big data companies paid database vendors to process data after months even years, but we are now collecting information far too quickly for this Where Does Big Data Come From? - big data comes from websites, social networks, e-commerce, finances, public data and a lot more - big data is not just some buzz word - we are now in an information overload - Google processes 25PB per day, Twitter sees 400 million tweets a day, NSA collects 20PB per day (the size of a dime on a basketball court) - that's a whole lot of bytes The Beginnings of Big Data - Google was one of the first to deal with the problem of managing data I. they take data they have a chop it up into smaller pieces and map it These notes represent a detailed interpretation of the professor’s lecture. GradeBuddy is best used as a supplement to your own notes, not as a substitute.II. next the chopped information is sent to many computers; parallel computing - computers working simultaneously on different parts of one project III. MapReduce (algorithm built by Google and given out for FREE) brings results IV. these results go to an open source project named Hadoop created by Yahoo o Hadoop startups like HortonWorks, Cloudera, MapR Inc. are having hundreds of millions for their ideas of big data soultions - Databricks is a newer big data solution that created "Spark" (a new big data algorithm) Why People Have Big Data Issues - sometimes there is not the right information system to grab data - lack of access to or knowledge about information - system may not meet the needs of industry - timeliness and presentation of information Types of Data - unstructured - structure is not formally defined (different students take notes differently) - semi-Structured (ex. emails) - structured (ex. Excel files) - geospatial - information based on specific location (ex. Foursquare, Facebook
View Full Document