Intro to OLAP Data Mining Keeping track of inventory cash flows employees all transactional data tactical Am I in the right business is the environment changing are the rules changing strategic planning data warehousing 11 20 12 3 main reasons to do data warehousing to improve existing business processes to improve data irregularities to understand historical phenomena you re analyzing data and something comes up that isn t right You can start diving down into your data and figure out why something happened seek patterns and golden nuggets strange relationships that you didn t see or couldn t have predicted but that could give you some sort of competitive advantage data warehousing and data mining go together You can t have one without the other Examples of DM use Some French wines are very expensive Rich people trade these just like they trade stocks When wines come out how do you know if it ll be a super wine at a high price or a mediocre wine An economist studied this and collected a large amount of data on wine sales and historical prices and analyzed different weather conditions around France and discovered a relationship Did data mining on historical prices coupled with info on the weather and stuff came u with a formula that is a better predictor of future wine prices than all the experts put together AND he can do it before the grapes are even picked Doctor goes to medical school and learns about a disease and learns that if he sees these symptoms you have this disease Started doing data mining and are seeing that certain treatments are more effective than others in some situations but this is not what they learned in school Many doctors ignore the data and do what they learned in school because they don t want to give up their decision making and give in to data based decisions Data collection and data mining allows these sorts of processes like distributing auto insurance online without even seeing them in person bc they fill out questions like age gender etc etc etc and decide if these people will be high risk or not Moved from scouting reports to metrics Moneyball Appearance bias in baseball TEST QUESTION BASED ON THIS SLIDE what kind of questions will you ask in different types of queries Comparison of Queries SQL OLAP Data Mining analyzing historical phenomena using historical data to try to predict the future OLAP Online analytical processing asks WHY did something happen not what Data mining is more predictive saying what WILL happen OLAP if something doesn t look right don t just shrug your shoulders and assume it s a fluke Take a look at this is it a one time event or a pattern or trend If mortgage rates fall housing sales tend to go up Right now mortgages are low but sales aren t increasing We are trying to figure out if this is a one time event or pattern OLAP operates on multi dimensional data cubes Subject tables feeding into a fact table Typically looking at summary numbers aggregate numbers sales in a particular store paper products in particular store Maybe paper product sales in a store are low You can drill down and say which paper good is low sales maybe there was a fire in the plant of that paper good or something If you click on blue link you can get more detail and can keep doing that and keep drilling down Microstrategy Wisdom tool looks at Facebook data likes and dislikes and compiles data Coca cola fans are likely to also enjoy Oreos and pringles Compared like Pepsi to like coca cola 0 to age 35 people prefer coke After 35 people shift towards Pepsi Data Mining Models and Tasks Lots of data mining tools and each one operates a little differently Ex Directed data mining technique Classification Directed means deals with discrete outcomes Trying to put an unknown data point into a bucket buckets can be yes no a b c and then a new data point comes around and based on analysis can decide which bucket to put data point in If you apply for insurance can analyze info and decide to put you in bucket where you ll get an insurance plan and bucket where you wont you have to build pre defined classes and the machine will build characteristics of each class so it knows where to put each data point if we hire a new professor will they be tenured or not Have a lot of data collected about professors Have to build a model and crunch data through the algorithm Two classes tenured yes and tenured no If rank professor or number of years of service is greater than 6 then tenured YES this is the model Look at picture slide with this process Now take ANOTHER set of data points that wasn t used in the training set and run it through the model and see that 3 out of the 4 times the model correctly predicted the answers slide w Use the model in prediction correct 75 of time New data comes in Jeff professor We predict he will be tenured with 75 confidence Decision Trees like pictures in previous slides Example model predicting when you ll play golf based on weather 2 buckets play or not play Based on Outlook temp humidity windy If overcast PLAY If sunny check humidity and if high go home but if normal PLAY If rain check wind and if not windy PLAY Information gain some parameters more valuable than others temperature not really valuable Outlook is most valuable Ex Undirected Association opposite situation There are NO predefined classes to fit data into Looking for some sort of connection Association takes existing data and tries to see if things go together if there s an association between these items When used in retail area its often called a market basket analysis MBA People who buy beer often tend to buy stuff that goes with beer like peanuts or something If I buy A there s a likelihood that I will also buy B Example selling DVD s If you have Gladiator and Patriot DVD the odds that you ll buy the Braveheart DVD is 70 MBA often used in retail store layouts every item placed in particular spot for particular reason In grocery store milk is in one corner bread is in opposite corner Because they have highest correlation between any 2 products on planet They know you ll still walk across store to get other thing Very intelligent and all data driven doctors were prescribing two types of tests that overlapped Lipstick the economy when the economy gets slow ladies start buying more lipstick instead of dresses So when lipstick sales go up sign that economy is getting bad Also when lipstick sales go up the colors become more bright and cheerful How good is an association rule
View Full Document