MIS 385/MBA 664 Systems Implementation with DBMS/ Database ManagementObjectivesImportance of Data QualityCharacteristics of Quality DataCauses of poor data qualityData quality improvementImproving Data Capture ProcessesData Stewardship ProgramPrinciples for High Quality Data ModelsExample of a many-to-many relationship as an entity typeData IntegrationTechniques for Data IntegrationComparing Consolidation, Federation, & Propagation as Forms of Data IntegrationMaster Data Management (MDM)Before ETL, operational data is…After ETL, data should be…The ETL ProcessCapture/Extract…obtaining a snapshot of a chosen subset of the source data for loading into the data warehouseScrub/Cleanse…uses pattern recognition and AI techniques to upgrade data qualityTransform = convert data from format of operational system to format of data warehouseSlide 21Single-field transformationMulti-field transformationSamples of Tools to Support Data Reconciliation and IntegrationMIS 385/MBA 664Systems Implementation with DBMS/Database ManagementDave [email protected] (email)http://www.davesalisbury.com/ (web site)ObjectivesDefinition of termsDescribe importance and measures of data qualityDefine characteristics of quality dataDescribe reasons for poor data quality in organizationsDescribe a program for improving data qualityDescribe three types of data integration approachesDescribe the purpose and role of master data managementDescribe four steps and activities of ETL for data integration for a data warehouseExplain various forms of data transformation for data warehousesImportance of Data QualityMinimize IT project riskMake timely business decisionsEnsure regulatory complianceExpand customer baseCharacteristics of Quality DataUniquenessAccuracyConsistencyCompletenessTimelinessCurrencyConformanceReferential integrityCauses of poor data qualityExternal data sourcesLack of control over data qualityRedundant data storage and inconsistent metadataProliferation of databases with uncontrolled redundancy and metadataData entryPoor data capture controlsLack of organizational commitmentDo not recognize poor data quality as an organizational issueData quality improvementPerform data quality auditImprove data capture processesEstablish data stewardship programApply total quality management (TQM) practicesApply modern DBMS technologyEstimate return on investmentStart with a high-quality data modelImproving Data Capture ProcessesAutomate data entry as much as possibleManual data entry should be selected from preset optionsUse trained operators when possibleFollow good user interface design principlesImmediate data validation for entered dataData Stewardship ProgramData stewardA person responsible for ensuring that organizational applications properly support the organization’s data quality goalsData governanceHigh-level organizational groups and processes overseeing data stewardship across the organizationPrinciples for High Quality Data ModelsEntity types represent underlying nature of an objectEntity types part of subtype/supertype hierarchy for universal contextActivities and associations represented by (event) entity types, not relationshipsRelationships used to represent only involvement of entity types with activities or associationsCandidate attributes suspected of representing relationships to other entity typesEntity types should have a single attribute as the primary unique identifierExample of a many-to-many relationship as an entity typeData IntegrationData integration creates a unified view of business dataOther possibilities:Application integrationBusiness process integrationUser interaction integrationAny approach required changed data capture (CDC)Indicates which data have changed since previous data integration activityTechniques for Data IntegrationConsolidation (ETL)Consolidating all data into a centralized database (like a data warehouse)Data federation (EII)Provides a virtual view of data without actually creating one centralized databaseData propagation (EAI and ERD)Duplicate data across databases, with near real-time delayComparing Consolidation, Federation, & Propagation as Forms of Data IntegrationMaster Data Management (MDM)The disciplines, technologies, and methods to ensure the currency, meaning, and quality of reference data within and across various subject areasThree main approachesIdentity registryIntegration hubPersistentBefore ETL, operational data is…Transient–not historicalNot normalized (perhaps due to denormalization for performance)Restricted in scope–not comprehensiveSometimes poor quality–inconsistencies and errorsAfter ETL, data should be…Detailed–not summarized yetHistorical–periodicNormalized–3rd normal form or higherComprehensive–enterprise-wide perspectiveTimely–data should be current enough to assist decision-makingQuality controlled–accurate with full integrityThe ETL ProcessCapture/ExtractScrub or data cleansingTransformLoad and IndexETL = Extract, transform, and loadStatic extract = capturing a snapshot of the source data at a point in timeIncremental extract = capturing changes that have occurred since the last static extractCapture/Extract…obtaining a snapshot of a chosen subset of the source data for loading into the data warehouseFixing errors: misspellings, erroneous dates, incorrect field usage, mismatched addresses, missing data, duplicate data, inconsistenciesAlso: decoding, reformatting, time stamping, conversion, key generation, merging, error detection/logging, locating missing dataScrub/Cleanse…uses pattern recognition and AI techniques to upgrade data qualityRecord-level:Selection–data partitioningJoining–data combiningAggregation–data summarizationField-level: singl e-field–from one field to one fieldmulti-field–from many fields to one, or one field to manyTransform = convert data from format of operational system to format of data warehouse21Load/Index= place transformed data into the warehouse and create indexesRefresh mode:Refresh mode: bulk rewriting of target data at periodic intervalsUpdate mode:Update mode: only changes in source data are written to data warehouseFigure 12-2 Steps in data reconciliation(cont.)In general–some transformation
View Full Document