DOC PREVIEW
U of I CS 511 - Data Warehouse & Online Analytical

This preview shows page 1-2 out of 6 pages.

Save
View full document
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

The story of the Walton s the richest family ever Data Warehouse Online Analytical Processing OLAP Aditya Ramani Arpit Jain Kashif Manzoor Omid Fatemieh 2 Need to the day What are the items I am running out of SELECT FROM PRODUCTS WHERE Quantity 200 1 Present data that can help him in making wise decision Okay now color code them so that anything below 100 is dark read and all others are white Show me the sum of all the individual transactions SELECT Transaction TID SUM Product Price FROM Transaction Product LineItem WHERE LineItem TID Transaction TID AND LineItem Product Product ID GROUP BY Transaction TID 2 Keep historical data so that he can perform trend analysis on the data 3 Keep the analysis data up to date 4 Don t overload him with lot of irrelevant data 5 Use the Keep It Simple Stupid rule just present what is of interest to him Show me all the items that are usually purchased together I can then place them close to each other on the shelves 6 Let him visualize the data in different ways as per his choice I had put Printer cartridge on sale in the hope that customer will buy a printer Show me if this strategy worked Now the Walmart manager is getting too fancy with the requests 3 4 Agenda What is Data Warehouse Introduction Data Warehouse OLAP Future Trends Thankyou to the class for their feedback on the newsgroup Theme Data warehousing provides architectures and tools for business executives to systematically organize understand and use their data to make strategic decisions Jiawei Han A data warehouse is a subject oriented integrated time variant and nonvolatile collection of data in Less slides more interaction more diagrams less convoluted equations more talk less sleepy faces hopefully this will lead to less pain support of management s decision making process William H Inmon All Relevant Info under one roof Changes as new data trickle in 5 To the point Retains the history 6 1 What is Online Analytical Processing OLAP DATA WAREHOUSE 7 Why can t the Existing DBMS do the trick 8 RDBMS and Data Warehouse no love lost Mundane day to day entries do not help in decision support Operational in nature Online Transaction We need a much higher and aggregated view of data Processing OLTP oriented Information that I could have obtained from a single table Normalized shouldn t be split into several tables Multipurpose support insert delete update I do not need to insert delete update the data I only need to view it select Support Transaction Management Why waste your energy on these features Recovery Locking etc Restrict I the size of data by dumping would need the historical data for my analysis historical data into archives Now these what everthings is left is something that I can use So why are bad not needed They are not at odds with each other RDBMS can be a placeholder for DataWarehouse to store huge amount of read only data Traditional RDBMS are made to handle operational data efficiently In their raw form they are not very efficient in handling huge de normalized read only data Many RDBMSs provide a DBMS version fully targeted towards a Data Warehouse 9 Data warehouse architecture Other sources Operational DBs Metadata Extract Transform Load Refresh Monitor Integrator Data Warehouse Terminologies Dimension city OLAP Server Data Warehouse Serve 10 The angle from which the data is to be analyzed Analysis Query Reports Data mining Measures e g sales profit The consolidated value of a particular dimension measured against one or more dimensions Can be a sum an average a median etc Multidimensional Analysis city and month Analyzing measures that have been aggregated against more than one dimensions Data Marts Data Sources Data Storage OLAP Engine Front End Tools 11 12 2 Cuboids from a SQL point of view Data Warehouse Terminologies cont d T Data Cube Cuboid Product City SELECT SUM PRICE FROM T SELECT Product SUM PRICE FROM T GROUP BY Product Example What is the total amount sale per month in the Walmarts in each city City Date Price SELECT City SUM PRICE FROM T GROUP BY City 13 Price Cuboids Each of the possible group by combination of dimensions is called a Cuboid the collection of all possible cuboids is called a Data Cube Table T Product Date 1 2 3 4 5 6 7 8 SELECT City SUM PRICE FROM T GROUP BY City Product City Date Product City Product Date City Date Product City Date SELECT City Date SUM PRICE FROM T GROUP BY City Date SELECT Product City Date SUM PRICE FROM T GROUP BY Product City Date 14 DW Data Model For n dimensions total cuboids would be 2n exponential growth The Star Schema time item time key day day of the week month quarter year item key item name brand type supplier type Sales Fact Table time key item key branch key location branch location key branch key branch name branch type location key street city state or province country units sold dollars sold Data Cube lattice of cuboids Notice that the word Cube does not necessarily mean 3 Dimensions avg sales Measures Dimensions 15 16 Data warehouse architecture The Snowflake Schema time time key day day of the week month quarter year Sales Fact Table time key item key item supplier item key item name brand type supplier key supplier key supplier type Other sources Operational DBs branch key branch branch key branch name branch type location location key units sold dollars sold avg sales location key street city key Metadata Extract Transform Load Refresh Monitor Integrator OLAP Server Data Warehouse Serve Analysis Query Reports Data mining city city key city state or province country 17 Data Marts Data Sources Data Storage OLAP Engine Front End Tools 18 3 Issues in DW Construction Issues in DW Construction cont d Issues Challenges Extract Transform Load ETL Refresh Curse of Dimensionality n Data Cleaning Missing data optional fields etc n dimensions result into 2 aggregated cells Data Loading Sorting aggregation building indexes Queries are complex Query optimization a real challenge Refresh Propagate update on source data to the data warehouse Recreate indexes aggregate tables Frequency Refreshing Methods Data shipping has more overhead Transaction shipping not portable Index may not be used Due to the complexity of the queries index may not be used 19 20 Materializing Cuboids Indexing OLAP Data Bitmap Index The user wants to see the results immediately Materializing all cuboids is really expensive or even impossible Alternatives 1 Only materialize the most specific cuboid Index on a particular column Each value


View Full Document

U of I CS 511 - Data Warehouse & Online Analytical

Download Data Warehouse & Online Analytical
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Data Warehouse & Online Analytical and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Data Warehouse & Online Analytical and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?