Unformatted text preview:

The Data Warehouse EnvironmentData Warehouse UsageWhy Separate Data Warehouse?What are Operational Systems?RDBMS used for OLTPExamples of Operational DataSo, what’s different?OLTP vs. Data WarehouseOLTP vs Data WarehouseSlide 10Slide 11To summarize ...Why Now?Subject OrientationApplication-Orientation vs. Subject-OrientationIntegrated dataTimeData Warehouse ArchitectureComponents of the WarehouseLoading the WarehouseData Extraction and CleansingSource DataData Quality - The RealityData Integration Across SourcesData Integrity ProblemsSlide 26Scrubbing DataLoadsStructuring/Modeling IssuesData -- Heart of the Data WarehouseData Warehouse StructureData Warehouse Design IssuesGranularityData Granularity in WarehouseGranularity in WarehouseGranularity and Data AnalysisSlide 37Slide 38Dual Levels of GranularityPartitioningSlide 41Slide 42Structuring Data in the DWPurging Warehouse DataData Warehouse vs. Data MartsFrom the Data Warehouse to Data MartsData Warehouse and Data MartsData Mart CentricProblems with Data Mart Centric SolutionTrue WarehouseDimensional Modeling VocabularyDimension TablesFact TableStar Join SchemaMetadata RepositoryMetadata Repository .. 2Recipe for a Successful WarehouseFor a Successful WarehouseSlide 59Slide 60Data Warehouse PitfallsSlide 62Slide 63AcknowledgementsThe Data Warehouse EnvironmentData Warehouse UsageThree kinds of data warehouse applicationsInformation processingsupports querying, basic statistical analysis, and reporting using crosstabs, tables, charts and graphsAnalytical processing and Interactive Analysismultidimensional analysis of data warehouse datasupports basic OLAP operations, slice-dice, drilling, pivotingData miningknowledge discovery from hidden patterns supports associations, constructing analytical models, performing classification and prediction, and presenting the mining results using visualization toolsWhy Separate Data Warehouse?PerformanceOp dbs designed & tuned for known OLTP uses & workloads.Complex OLAP queries would degrade performance.Special data organization, access & implementation methods needed for multidimensional views & queries. FunctionMissing data: Decision support requires historical data, which op dbs do not typically maintain.Data consolidation: Decision support requires consolidation (aggregation, summarization) of data from many heterogeneous sources: op dbs, external sources. Data quality: Different sources typically use inconsistent data representations, codes, and formats which have to be reconciled.What are Operational Systems?They are OLTP systemsRun mission critical applicationsNeed to work with stringent performance requirements for routine tasksRun the business in real timeBased on up-to-the-second dataOptimized to handle large numbers of simple read/write transactionsOptimized for fast response to predefined transactionsUsed by people who deal with customers, products -- clerks, salespeople etc.They are increasingly used by customersRDBMS used for OLTPDatabase Systems have been used traditionally for OLTPclerical data processing tasksdetailed, up to date datastructured repetitive tasksread/update a few recordsExamples of Operational DataData IndustryUsage Technology VolumesCustomerFileAll TrackCustomerDetailsLegacy application, flatfiles, main framesSmall-mediumAccountBalanceFinance ControlaccountactivitiesLegacy applications,hierarchical databases,mainframeLargePoint-of-Sale dataRetail Generatebills, managestockERP, Client/Server,relational databasesVery LargeCallRecordTelecommu-nicationsBilling Legacy application,hierarchical database,mainframeVery LargeProductionRecordManufactu-ringControlProductionERP,relational databases,AS/400MediumSo, what’s different?OLTP vs. Data WarehouseOLTP systems are tuned for known transactions and workloads while workload is not known a priori in a data warehouseSpecial data organization, access methods and implementation methods are needed to support data warehouse queries (typically multidimensional queries)e.g., average amount spent on phone calls between 9AM-5PM in Charlotte during the month of DecemberOLTP vs Data WarehouseOLTPApplication OrientedUsed to run businessDetailed dataCurrent up to dateIsolated DataRepetitive accessClerical UserWarehouse (DSS)Subject OrientedUsed to analyze businessSummarized and refinedSnapshot dataIntegrated DataAd-hoc accessKnowledge User (Manager)OLTP vs Data WarehouseOLTPPerformance SensitiveFew Records accessed at a time (tens)Read/Update AccessNo data redundancyDatabase Size 100MB -100 GBData WarehousePerformance relaxedLarge volumes accessed at a time(millions)Mostly Read (Batch Update)Redundancy presentDatabase Size 100 GB - few terabytesOLTP vs Data WarehouseOLTPTransaction throughput is the performance metricThousands of usersManaged in entiretyData WarehouseQuery throughput is the performance metricHundreds of usersManaged by subsetsTo summarize ...OLTP Systems are used to “run” a businessThe Data Warehouse helps to “optimize” the businessWhy Now?Data is being producedERP provides clean dataThe computing power is availableThe computing power is affordableThe competitive pressures are strongCommercial products are availableSubject OrientationDW is organized by major subject areas and entities of the business organizationData warehouse model aligns with the corporate logical data modelExample of major subject areas for InsuranceCustomerProductTransaction activityClaimPolicyAccountApplication-Orientation vs. Subject-OrientationApplication-OrientationOperational DatabaseLoansCredit CardTrustSavingsSubject-OrientationDataWarehouseCustomerVendorProductActivityIntegrated dataThere is not application consistency in the operational data As data from different systems is entered into the DW, entities and attributes are encoded using a consistent key or measurementTimeData warehouse is nothing more than a sophisticated series of snapshots, taken at one moment in timeThe key structure of the DW always contains some element of timeData Warehouse ArchitectureData Warehouse EngineOptimized LoaderExtractionCleansingAnalyzeQueryMetadata RepositoryRelationalDatabasesLegacyDataPurchased DataERPSystemsComponents of the WarehouseData Extraction


View Full Document

UNCC MBAD 6201 - The Data Warehouse Environment

Download The Data Warehouse Environment
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view The Data Warehouse Environment and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view The Data Warehouse Environment 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?