The Data Warehouse EnvironmentData Warehouse UsageWhy Separate Data Warehouse?What are Operational Systems?RDBMS used for OLTPExamples of Operational DataSo, what’s different?OLTP vs. Data WarehouseOLTP vs Data WarehouseSlide 10Slide 11To summarize ...Why Now?Subject OrientationApplication-Orientation vs. Subject-OrientationIntegrated dataTimeData Warehouse ArchitectureComponents of the WarehouseLoading the WarehouseData Extraction and CleansingSource DataData Quality - The RealityData Integration Across SourcesData Integrity ProblemsSlide 26Scrubbing DataLoadsStructuring/Modeling IssuesData -- Heart of the Data WarehouseData Warehouse StructureData Warehouse Design IssuesGranularityData Granularity in WarehouseGranularity in WarehouseGranularity and Data AnalysisSlide 37Slide 38Dual Levels of GranularityPartitioningSlide 41Slide 42Structuring Data in the DWPurging Warehouse DataData Warehouse vs. Data MartsFrom the Data Warehouse to Data MartsData Warehouse and Data MartsData Mart CentricProblems with Data Mart Centric SolutionTrue WarehouseDimensional Modeling VocabularyDimension TablesFact TableStar Join SchemaMetadata RepositoryMetadata Repository .. 2Recipe for a Successful WarehouseFor a Successful WarehouseSlide 59Slide 60Data Warehouse PitfallsSlide 62Slide 63AcknowledgementsThe Data Warehouse EnvironmentData Warehouse UsageThree kinds of data warehouse applicationsInformation processingsupports querying, basic statistical analysis, and reporting using crosstabs, tables, charts and graphsAnalytical processing and Interactive Analysismultidimensional analysis of data warehouse datasupports basic OLAP operations, slice-dice, drilling, pivotingData miningknowledge discovery from hidden patterns supports associations, constructing analytical models, performing classification and prediction, and presenting the mining results using visualization toolsWhy Separate Data Warehouse?PerformanceOp dbs designed & tuned for known OLTP uses & workloads.Complex OLAP queries would degrade performance.Special data organization, access & implementation methods needed for multidimensional views & queries. FunctionMissing data: Decision support requires historical data, which op dbs do not typically maintain.Data consolidation: Decision support requires consolidation (aggregation, summarization) of data from many heterogeneous sources: op dbs, external sources. Data quality: Different sources typically use inconsistent data representations, codes, and formats which have to be reconciled.What are Operational Systems?They are OLTP systemsRun mission critical applicationsNeed to work with stringent performance requirements for routine tasksRun the business in real timeBased on up-to-the-second dataOptimized to handle large numbers of simple read/write transactionsOptimized for fast response to predefined transactionsUsed by people who deal with customers, products -- clerks, salespeople etc.They are increasingly used by customersRDBMS used for OLTPDatabase Systems have been used traditionally for OLTPclerical data processing tasksdetailed, up to date datastructured repetitive tasksread/update a few recordsExamples of Operational DataData IndustryUsage Technology VolumesCustomerFileAll TrackCustomerDetailsLegacy application, flatfiles, main framesSmall-mediumAccountBalanceFinance ControlaccountactivitiesLegacy applications,hierarchical databases,mainframeLargePoint-of-Sale dataRetail Generatebills, managestockERP, Client/Server,relational databasesVery LargeCallRecordTelecommu-nicationsBilling Legacy application,hierarchical database,mainframeVery LargeProductionRecordManufactu-ringControlProductionERP,relational databases,AS/400MediumSo, what’s different?OLTP vs. Data WarehouseOLTP systems are tuned for known transactions and workloads while workload is not known a priori in a data warehouseSpecial data organization, access methods and implementation methods are needed to support data warehouse queries (typically multidimensional queries)e.g., average amount spent on phone calls between 9AM-5PM in Charlotte during the month of DecemberOLTP vs Data WarehouseOLTPApplication OrientedUsed to run businessDetailed dataCurrent up to dateIsolated DataRepetitive accessClerical UserWarehouse (DSS)Subject OrientedUsed to analyze businessSummarized and refinedSnapshot dataIntegrated DataAd-hoc accessKnowledge User (Manager)OLTP vs Data WarehouseOLTPPerformance SensitiveFew Records accessed at a time (tens)Read/Update AccessNo data redundancyDatabase Size 100MB -100 GBData WarehousePerformance relaxedLarge volumes accessed at a time(millions)Mostly Read (Batch Update)Redundancy presentDatabase Size 100 GB - few terabytesOLTP vs Data WarehouseOLTPTransaction throughput is the performance metricThousands of usersManaged in entiretyData WarehouseQuery throughput is the performance metricHundreds of usersManaged by subsetsTo summarize ...OLTP Systems are used to “run” a businessThe Data Warehouse helps to “optimize” the businessWhy Now?Data is being producedERP provides clean dataThe computing power is availableThe computing power is affordableThe competitive pressures are strongCommercial products are availableSubject OrientationDW is organized by major subject areas and entities of the business organizationData warehouse model aligns with the corporate logical data modelExample of major subject areas for InsuranceCustomerProductTransaction activityClaimPolicyAccountApplication-Orientation vs. Subject-OrientationApplication-OrientationOperational DatabaseLoansCredit CardTrustSavingsSubject-OrientationDataWarehouseCustomerVendorProductActivityIntegrated dataThere is not application consistency in the operational data As data from different systems is entered into the DW, entities and attributes are encoded using a consistent key or measurementTimeData warehouse is nothing more than a sophisticated series of snapshots, taken at one moment in timeThe key structure of the DW always contains some element of timeData Warehouse ArchitectureData Warehouse EngineOptimized LoaderExtractionCleansingAnalyzeQueryMetadata RepositoryRelationalDatabasesLegacyDataPurchased DataERPSystemsComponents of the WarehouseData Extraction
View Full Document