UNCC MBAD 6201 - The Data Warehouse Environment (64 pages)

Previewing pages 1, 2, 3, 4, 30, 31, 32, 33, 34, 61, 62, 63, 64 of 64 page document View the full content.
View Full Document

The Data Warehouse Environment



Previewing pages 1, 2, 3, 4, 30, 31, 32, 33, 34, 61, 62, 63, 64 of actual document.

View the full content.
View Full Document
View Full Document

The Data Warehouse Environment

37 views


Pages:
64
School:
University of North Carolina - Charlotte
Course:
Mbad 6201 - Business Intelligence and Analytics
Business Intelligence and Analytics Documents
Unformatted text preview:

The Data Warehouse Environment Data Warehouse Usage Three kinds of data warehouse applications Information processing supports querying basic statistical analysis and reporting using crosstabs tables charts and graphs Analytical processing and Interactive Analysis multidimensional analysis of data warehouse data supports basic OLAP operations slice dice drilling pivoting Data mining knowledge discovery from hidden patterns supports associations constructing analytical models performing classification and prediction and presenting the mining results using visualization tools Why Separate Data Warehouse Performance Op dbs designed tuned for known OLTP uses workloads Complex OLAP queries would degrade performance Special data organization access implementation methods needed for multidimensional views queries Function Missing data Decision support requires historical data which op dbs do not typically maintain Data consolidation Decision support requires consolidation aggregation summarization of data from many heterogeneous sources op dbs external sources Data quality Different sources typically use inconsistent data representations codes and formats which have to be reconciled What are Operational Systems They are OLTP systems Run mission critical applications Need to work with stringent performance requirements for routine tasks Run the business in real time Based on up to the second data Optimized to handle large numbers of simple read write transactions Optimized for fast response to predefined transactions Used by people who deal with customers products clerks salespeople etc They are increasingly used by customers RDBMS used for OLTP Database Systems have been used traditionally for OLTP clerical data processing tasks detailed up to date data structured repetitive tasks read update a few records Examples of Operational Data Data IndustryUsage Customer All File Technology Volumes Track Legacy application flat Small medium Customer files main frames Details Account Finance Control Legacy applications Large Balance account hierarchical databases activities mainframe Point of Retail Generate ERP Client Server Very Large Sale data bills manage relational databases stock Call Telecommu Billing Legacy application Very Large Record nications hierarchical database mainframe ProductionManufactu Control ERP Medium Record ring Production relational databases AS 400 So what s different OLTP vs Data Warehouse OLTP systems are tuned for known transactions and workloads while workload is not known a priori in a data warehouse Special data organization access methods and implementation methods are needed to support data warehouse queries typically multidimensional queries e g average amount spent on phone calls between 9AM 5PM in Charlotte during the month of December OLTP vs Data Warehouse OLTP Application Oriented Used to run business Detailed data Current up to date Isolated Data Repetitive access Clerical User Warehouse DSS Subject Oriented Used to analyze business Summarized and refined Snapshot data Integrated Data Ad hoc access Knowledge User Manager OLTP vs Data Warehouse OLTP Performance Sensitive Few Records accessed at a time tens Data Warehouse Read Update Access No data redundancy Database Size 100MB 100 GB Performance relaxed Large volumes accessed at a time millions Mostly Read Batch Update Redundancy present Database Size 100 GB few terabytes OLTP vs Data Warehouse OLTP Transaction throughput is the performance metric Thousands of users Managed in entirety Data Warehouse Query throughput is the performance metric Hundreds of users Managed by subsets To summarize OLTP Systems are used to run a business The Data Warehouse helps to optimize the business Why Now Data is being produced ERP provides clean data The computing power is available The computing power is affordable The competitive pressures are strong Commercial products are available Subject Orientation DW is organized by major subject areas and entities of the business organization Data warehouse model aligns with the corporate logical data model Example of major subject areas for Insurance Customer Product Transaction activity Claim Policy Account Application Orientation vs Subject Orientation Application Orientation Subject Orientation Operation al Database Loans Credit Card Data Warehouse Customer Vendor Trust Savings Product Activity Integrated data There is not application consistency in the operational data As data from different systems is entered into the DW entities and attributes are encoded using a consistent key or measurement Time Data warehouse is nothing more than a sophisticated series of snapshots taken at one moment in time The key structure of the DW always contains some element of time Data Warehouse Architecture Relational Databases Optimized Loader ERP Systems Extraction Cleansing Data Warehouse Engine Purchased Data Legacy Data Metadata Repository Analyze Query Components of the Warehouse Data Extraction and Loading The Warehouse Analyze and Query OLAP Tools Metadata Data Mining tools Loading the Warehouse Extracting cleaning transforming the data before it is loaded Data Extraction and Cleansing Extract data from existing operational and legacy data Issues Sources of data for the warehouse Data quality at the sources Merging different data sources Data transformation How to propagate updates on the sources to the warehouse Terabytes of data to be loaded Source Data Operational Source Data Legacy Relational External Typically host based legacy applications Sequential Customized applications COBOL 3GL 4GL Point of Contact Devices POS ATM Call switches External Sources Need to go through ETL Extract Transform Load Data Quality The Reality Legacy systems not well documented if at all Outside sources with questionable quality procedures Production systems with no built in integrity checks and no integration Operational systems are usually designed to solve a specific business problem and are rarely developed to a corporate plan And get it done quickly we do not have time to worry about corporate standards Data Integration Across Sources Savings Same data different name Loans Different data Same name Trust Data found here nowhere else Credit card Different keys same data Data Integrity Problems Same person different spellings Different account numbers generated by different applications for the same customer Required fields left blank Invalid product codes collected at point of sale manual entry


View Full Document

Access the best Study Guides, Lecture Notes and Practice Exams

Loading Unlocking...
Login

Join to view The Data Warehouse Environment and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view The Data Warehouse Environment and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?