Unformatted text preview:

11 15 12 Info Gap Starving for data drowning in info Data Warehousing Special form of a database Very large Why are we doing data warehousing Most people don t throw away data bc they are afraid there s good stuff in there that they haven t gotten around to examining Companies are drowning in data afraid to throw it out but they re not getting anything out of it Most Information Systems are built to support day to day operations We need these things because they are essential for our businesses These systems support short term tactical decisions These systems don t really support strategic decision making The machines only provide evidence or info for us to make decisions but we feel better about our decisions if we can back them up with data With Data Warehousing Collect info from inside and outside company Take all the data and extract info from it Crypted knowledge stuff that isn t obvious Often a surprise Relationships that you didn t know were in there If you find this stuff you can have a competitive advantage Data warehouse a subject oriented integrated time variant nonvolatile collection of data Subject oriented data in data warehouse organized around key subjects Integrated you collect the data from all sorts of different places Could be divisions in your company Need consistent definitions for things like cost time variant only way to look into future is to look into past and try to extrapolate forward nonvolatile we re collecting data and pouring it into data warehouse all the time and when we analyze it were just reading it We don t delete or modify Its just read and analyzed by people so it keeps growing Have all this data from diff sources Need to collect it clean it and combine it Must make it consistent and uniform Then you have the decision makers asking questions to the data warehouse trying to use the information to predict what s going on 2 level architecture raw data gets cleaned and combine and then there s polished data in the warehouse Problem with this is that the data warehouse gets so big that performance can be very slow solution is to go to 3 level architecture where you add a 3rd layer Selected data select some data and move it to a place called Data Mart You can select only sales data only last years worth of data and put it in another place Can do this for sales data marketing data etc Adds an extra layer but in exchange we get better response Reason for data mart is to get improved performance how do you get data into data warehouse clean and combine phase extract transform load ETL process where you extract data from various sources Then you have to transform he data than you load it into data warehouse Very difficult process Steps in ETL E Capture T scrub T transform L load and index Main issue with Capture is that people don t like giving up their data One way to solve this problem automate the process if you can can cost lots of Typically people buy gas with credit cards in by time you leave gas station your payment has been processed and immediately the transaction goes to headquarters They had to put a credit card machine in every gas pump They automated the data collection process Data comes in as it happens but they had to spend money to do it May how to modify business operations for Capture Scrub is hardest of the 4 steps because your data is loaded with garbage misspellings erroneous dates etc If 3 contradicting things entered into data how do you know which of 3 is right To cleanse data you may need t use very difficult techniques Transform step need to transform all diff forms into one consistent format maybe currency conversions choice of one unit of measure like Fahrenheit or Celsius Combining to make everything uniform Load step Have to cram this stuff into a database Problem is the volume Data errors try to correct at source harder to do it once its in system not much motivation to data quality If people get paid more for quantity of info they will be less concerned about quality of data Data gatherers aren t data users Air irregularities data warehouse the data was so bad they could not include anything with confidence Sufficient quality data couldn t be gathered to prove the errors that wee taking place because the data collection process was flawed Technical success but business failure Postal service people filling our air irregularity reports didn t want to do this job so they did a bad job so there was bad data collection The process modifications were so expensive that the cost to repair was more expensive than the value they would ve gotten out of it So they shut the data warehouse down data mart take subset of data from data warehouse to reduce population so you can improve the response Maybe you only want data from one country from one year from one particular division etc Data Warehousing Structure What does this data look like one gigantic table called the Fact table Star Schema way data warehouse is structured asterisk all intersecting at some center point Fact table and center and subjects out on the ends data warehouse is subject oriented Typical one has 15 20 subjects picture example only has 4 subjects need to decide what your grain is level of your time dimension can be storing data per second per minute per hour per day per week etc the finer the grain per second the more data than a courser grain like per day if you want to know what happened between 12 and 2 fine detail then you need a finer grain Finer grain finer detail much biggest data warehouse can answer more detailed questions the data in a fact table center can be very large bc can be multiplicative example in slides fact table almost a billion records and the grain is months which isn t realistic so there would be even more records


View Full Document

UMD BMGT 301 - Lecture notes

Documents in this Course
Big Data

Big Data

27 pages

Hardware

Hardware

13 pages

Hardware

Hardware

10 pages

MIDTERM

MIDTERM

4 pages

Notes

Notes

13 pages

Notes

Notes

3 pages

Quiz 4

Quiz 4

4 pages

Quiz 2

Quiz 2

2 pages

Netflix

Netflix

1 pages

Notes

Notes

4 pages

Midterm

Midterm

6 pages

Netflix

Netflix

1 pages

Essay

Essay

6 pages

Notes

Notes

6 pages

Notes

Notes

7 pages

Final

Final

24 pages

Notes

Notes

2 pages

WEB PAGES

WEB PAGES

35 pages

Web 2.0

Web 2.0

13 pages

Summary

Summary

1 pages

Exam 1

Exam 1

10 pages

Notes

Notes

8 pages

Exam 1

Exam 1

23 pages

Load more
Download Lecture notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture notes and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?